from:"Jai Bheemsen Rao Dhanwada \(Jira\)"

[jira] [Commented] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-13 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846027#comment-17846027
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-19633:
---

This looks to be due to the optimization done in 
[CASSANDRA-4650|https://issues.apache.org/jira/browse/CASSANDRA-4650]

> Replaced node is stuck in a loop calculating ranges
> ---
>
> Key: CASSANDRA-19633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19633
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>  Labels: Bootstrap
> Attachments: result1.html
>
>
> Hello,
>  
> I am running into an issue where in a node that is replacing a dead 
> (non-seed) node is stuck in calculating ranges forever. It eventually 
> succeeds, however the time taken for calculating the ranges is not constant. 
> I do sometimes see that it takes 24 hours to calculate ranges for each 
> keyspace. Attached the flume graph of the cassandra process during this time, 
> which points to the below code. 
> {code:java}
> Multimap> 
> getRangeFetchMapForNonTrivialRanges()
> {
> //Get the graph with edges between ranges and their source endpoints
> MutableCapacityGraph graph = getGraph();
> //Add source and destination vertex and edges
> addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
> int flow = 0;
> MaximumFlowAlgorithmResult> result = 
> null;
> //We might not be working on all ranges
> while (flow < getTotalRangeVertices(graph))
> {
> if (flow > 0)
> { //We could not find a path with previous graph. Bump the capacity b/w 
> endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
> MaximumFlowAlgorithm fordFulkerson = 
> FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
> result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
> IntegerNumberSystem.getInstance());
> int newFlow = result.calcTotalFlow();
> assert newFlow > flow; //We are not making progress which should not happen
> flow = newFlow;
> }
> return getRangeFetchMapFromGraphResult(graph, result);
> }
> {code}
> Digging through the logs, I see the below log line for a given keyspace 
> `system_auth`
> {code:java}
> INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
> Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
> Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
> keyspace system_auth{code}
> corresponding code:
> {code:java}
> for (Map.Entry entry : fetchMap.flattenEntries())
> logger.info("{}: range {} exists on {} for keyspace {}", description, 
> entry.getKey(), entry.getValue(), keyspaceName);{code}
> BUT do not see the below line for the corresponding keyspace
> {code:java}
> RangeStreamer.java:606 - Output from RangeFetchMapCalculator for 
> keyspace{code}
> this means the code it's stuck in `getRangeFetchMap();`
> {code:java}
> Multimap> rangeFetchMapMap = 
> calculator.getRangeFetchMap();
> logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
> keyspace);{code}
> Here is the cluster topology:
>  * Cassandra version: 4.0.12
>  * # of nodes: 190
>  * Tokens (vnodes): 128
> Initial hypothesis was that the graph calculation was taking longer due to 
> the combination of nodes + tokens + tables but in the same cluster I see one 
> of the node joined without any issues. 
> wondering if I am hitting a bug causing it to  work sometimes but get into an 
> infinite loop some times?
> Please let me know if you need any other details and appreciate any pointers 
> to debug this further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-10 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633:
--
 Bug Category: Parent values: Degradation(12984)
Discovered By: User Report
Since Version: 4.0

> Replaced node is stuck in a loop calculating ranges
> ---
>
> Key: CASSANDRA-19633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19633
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>  Labels: Bootstrap
> Attachments: result1.html
>
>
> Hello,
>  
> I am running into an issue where in a node that is replacing a dead 
> (non-seed) node is stuck in calculating ranges forever. It eventually 
> succeeds, however the time taken for calculating the ranges is not constant. 
> I do sometimes see that it takes 24 hours to calculate ranges for each 
> keyspace. Attached the flume graph of the cassandra process during this time, 
> which points to the below code. 
> {code:java}
> Multimap> 
> getRangeFetchMapForNonTrivialRanges()
> {
> //Get the graph with edges between ranges and their source endpoints
> MutableCapacityGraph graph = getGraph();
> //Add source and destination vertex and edges
> addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
> int flow = 0;
> MaximumFlowAlgorithmResult> result = 
> null;
> //We might not be working on all ranges
> while (flow < getTotalRangeVertices(graph))
> {
> if (flow > 0)
> { //We could not find a path with previous graph. Bump the capacity b/w 
> endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
> MaximumFlowAlgorithm fordFulkerson = 
> FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
> result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
> IntegerNumberSystem.getInstance());
> int newFlow = result.calcTotalFlow();
> assert newFlow > flow; //We are not making progress which should not happen
> flow = newFlow;
> }
> return getRangeFetchMapFromGraphResult(graph, result);
> }
> {code}
> Digging through the logs, I see the below log line for a given keyspace 
> `system_auth`
> {code:java}
> INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
> Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
> Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
> keyspace system_auth{code}
> corresponding code:
> {code:java}
> for (Map.Entry entry : fetchMap.flattenEntries())
> logger.info("{}: range {} exists on {} for keyspace {}", description, 
> entry.getKey(), entry.getValue(), keyspaceName);{code}
> BUT do not see the below line for the corresponding keyspace
> {code:java}
> RangeStreamer.java:606 - Output from RangeFetchMapCalculator for 
> keyspace{code}
> this means the code it's stuck in `getRangeFetchMap();`
> {code:java}
> Multimap> rangeFetchMapMap = 
> calculator.getRangeFetchMap();
> logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
> keyspace);{code}
> Here is the cluster topology:
>  * Cassandra version: 4.0.12
>  * # of nodes: 190
>  * Tokens (vnodes): 128
> Initial hypothesis was that the graph calculation was taking longer due to 
> the combination of nodes + tokens + tables but in the same cluster I see one 
> of the node joined without any issues. 
> wondering if I am hitting a bug causing it to  work sometimes but get into an 
> infinite loop some times?
> Please let me know if you need any other details and appreciate any pointers 
> to debug this further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-10 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633:
--
Labels: Bootstrap  (was: )

> Replaced node is stuck in a loop calculating ranges
> ---
>
> Key: CASSANDRA-19633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19633
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>  Labels: Bootstrap
> Attachments: result1.html
>
>
> Hello,
>  
> I am running into an issue where in a node that is replacing a dead 
> (non-seed) node is stuck in calculating ranges forever. It eventually 
> succeeds, however the time taken for calculating the ranges is not constant. 
> I do sometimes see that it takes 24 hours to calculate ranges for each 
> keyspace. Attached the flume graph of the cassandra process during this time, 
> which points to the below code. 
> {code:java}
> Multimap> 
> getRangeFetchMapForNonTrivialRanges()
> {
> //Get the graph with edges between ranges and their source endpoints
> MutableCapacityGraph graph = getGraph();
> //Add source and destination vertex and edges
> addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
> int flow = 0;
> MaximumFlowAlgorithmResult> result = 
> null;
> //We might not be working on all ranges
> while (flow < getTotalRangeVertices(graph))
> {
> if (flow > 0)
> { //We could not find a path with previous graph. Bump the capacity b/w 
> endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
> MaximumFlowAlgorithm fordFulkerson = 
> FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
> result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
> IntegerNumberSystem.getInstance());
> int newFlow = result.calcTotalFlow();
> assert newFlow > flow; //We are not making progress which should not happen
> flow = newFlow;
> }
> return getRangeFetchMapFromGraphResult(graph, result);
> }
> {code}
> Digging through the logs, I see the below log line for a given keyspace 
> `system_auth`
> {code:java}
> INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
> Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
> Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
> keyspace system_auth{code}
> corresponding code:
> {code:java}
> for (Map.Entry entry : fetchMap.flattenEntries())
> logger.info("{}: range {} exists on {} for keyspace {}", description, 
> entry.getKey(), entry.getValue(), keyspaceName);{code}
> BUT do not see the below line for the corresponding keyspace
> {code:java}
> RangeStreamer.java:606 - Output from RangeFetchMapCalculator for 
> keyspace{code}
> this means the code it's stuck in `getRangeFetchMap();`
> {code:java}
> Multimap> rangeFetchMapMap = 
> calculator.getRangeFetchMap();
> logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
> keyspace);{code}
> Here is the cluster topology:
>  * Cassandra version: 4.0.12
>  * # of nodes: 190
>  * Tokens (vnodes): 128
> Initial hypothesis was that the graph calculation was taking longer due to 
> the combination of nodes + tokens + tables but in the same cluster I see one 
> of the node joined without any issues. 
> wondering if I am hitting a bug causing it to  work sometimes but get into an 
> infinite loop some times?
> Please let me know if you need any other details and appreciate any pointers 
> to debug this further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-10 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633:
--
Description: 
Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 
{code:java}
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;
//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{ //We could not find a path with previous graph. Bump the capacity b/w 
endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());
int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}
return getRangeFetchMapFromGraphResult(graph, result);
}
{code}
Digging through the logs, I see the below log line for a given keyspace 
`system_auth`
{code:java}
INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth{code}
corresponding code:
{code:java}
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for keyspace {}", description, 
entry.getKey(), entry.getValue(), keyspaceName);{code}
BUT do not see the below line for the corresponding keyspace
{code:java}
RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace{code}
this means the code it's stuck in `getRangeFetchMap();`
{code:java}
Multimap> rangeFetchMapMap = 
calculator.getRangeFetchMap();
logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
keyspace);{code}
Here is the cluster topology:
 * Cassandra version: 4.0.12
 * # of nodes: 190
 * Tokens (vnodes): 128

Initial hypothesis was that the graph calculation was taking longer due to the 
combination of nodes + tokens + tables but in the same cluster I see one of the 
node joined without any issues. 
wondering if I am hitting a bug causing it to  work sometimes but get into an 
infinite loop some times?
Please let me know if you need any other details and appreciate any pointers to 
debug this further.

  was:
Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 
{code:java}
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;
//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{ //We could not find a path with previous graph. Bump the capacity b/w 
endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());
int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}
return getRangeFetchMapFromGraphResult(graph, result);
}
{code}
Digging through the logs, I see the below log line for a given keyspace 
`system_auth`
{code:java}
INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth{code}
 

corresponding code:
{code:java}
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for keyspace {}

[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-10 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633:
--
Description: 
Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 
{code:java}
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;
//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{ //We could not find a path with previous graph. Bump the capacity b/w 
endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());
int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}
return getRangeFetchMapFromGraphResult(graph, result);
}
{code}
Digging through the logs, I see the below log line for a given keyspace 
`system_auth`
{code:java}
INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth{code}
 

corresponding code:
{code:java}
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for keyspace {}", description, 
entry.getKey(), entry.getValue(), keyspaceName);{code}
BUT do not see the below line for the corresponding keyspace
{code:java}
RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace{code}
this means the code it's stuck in `getRangeFetchMap();`
{code:java}
Multimap> rangeFetchMapMap = 
calculator.getRangeFetchMap();
logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
keyspace);{code}
Here is the cluster topology:
 * Cassandra version: 4.0.12
 * # of nodes: 190
 * Tokens (vnodes): 128

Initial hypothesis was that the graph calculation was taking longer due to the 
combination of nodes + tokens + tables but in the same cluster I see one of the 
node joined without any issues. 
wondering if I am hitting a bug causing it to  work sometimes but get into an 
infinite loop some times?
Please let me know if you need any other details and appreciate any pointers to 
debug this further.

  was:
Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 

 

 
{code:java}
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;
//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{ //We could not find a path with previous graph. Bump the capacity b/w 
endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());
int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}
return getRangeFetchMapFromGraphResult(graph, result);
}
{code}
Digging through the logs, I see the below log line for a given keyspace 
`system_auth`
{code:java}
INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth{code}
 

corresponding code:
{code:java}
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for ke

[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-10 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633:
--
Description: 
Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 

 

 
{code:java}
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;
//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{ //We could not find a path with previous graph. Bump the capacity b/w 
endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());
int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}
return getRangeFetchMapFromGraphResult(graph, result);
}
{code}
Digging through the logs, I see the below log line for a given keyspace 
`system_auth`
{code:java}
INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth{code}
 

corresponding code:
{code:java}
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for keyspace {}", description, 
entry.getKey(), entry.getValue(), keyspaceName);{code}
BUT do not see the below line for the corresponding keyspace
{code:java}
RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace{code}
this means the code it's stuck in `getRangeFetchMap();`
{code:java}
Multimap> rangeFetchMapMap = 
calculator.getRangeFetchMap();
logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
keyspace);{code}
Here is the cluster topology:
 * Cassandra version: 4.0.12
 * # of nodes: 190
 * Tokens (vnodes): 128

Initial hypothesis was that the graph calculation was taking longer due to the 
combination of nodes + tokens + tables but in the same cluster I see one of the 
node joined without any issues. 
wondering if I am hitting a bug causing it to  work sometimes but get into an 
infinite loop some times?
Please let me know if you need any other details and appreciate any pointers to 
debug this further.

  was:
Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 

 

```
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));

int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;

//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{
//We could not find a path with previous graph. Bump the capacity b/w endpoint 
vertices and destination by 1
incrementCapacity(graph, 1);
}

MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());

int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}

return getRangeFetchMapFromGraphResult(graph, result);
}
```

Digging through the logs, I see the below log line for a given keyspace 
`system_auth`

 

```

INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth

```

corresponding code:

 

```
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for keyspace {}",

[jira] [Created] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges

2024-05-10 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-19633:
-

 Summary: Replaced node is stuck in a loop calculating ranges
 Key: CASSANDRA-19633
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19633
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada
 Attachments: result1.html

Hello,

 

I am running into an issue where in a node that is replacing a dead (non-seed) 
node is stuck in calculating ranges forever. It eventually succeeds, however 
the time taken for calculating the ranges is not constant. I do sometimes see 
that it takes 24 hours to calculate ranges for each keyspace. Attached the 
flume graph of the cassandra process during this time, which points to the 
below code. 

 

```
Multimap> getRangeFetchMapForNonTrivialRanges()
{
//Get the graph with edges between ranges and their source endpoints
MutableCapacityGraph graph = getGraph();
//Add source and destination vertex and edges
addSourceAndDestination(graph, getDestinationLinkCapacity(graph));

int flow = 0;
MaximumFlowAlgorithmResult> result = 
null;

//We might not be working on all ranges
while (flow < getTotalRangeVertices(graph))
{
if (flow > 0)
{
//We could not find a path with previous graph. Bump the capacity b/w endpoint 
vertices and destination by 1
incrementCapacity(graph, 1);
}

MaximumFlowAlgorithm fordFulkerson = 
FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
IntegerNumberSystem.getInstance());

int newFlow = result.calcTotalFlow();
assert newFlow > flow; //We are not making progress which should not happen
flow = newFlow;
}

return getRangeFetchMapFromGraphResult(graph, result);
}
```

Digging through the logs, I see the below log line for a given keyspace 
`system_auth`

 

```

INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
keyspace system_auth

```

corresponding code:

 

```
for (Map.Entry entry : fetchMap.flattenEntries())
logger.info("{}: range {} exists on {} for keyspace {}", description, 
entry.getKey(), entry.getValue(), keyspaceName);
```

BUT do not see the below line for the corresponding keyspace

 

```

RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace

```

this means the code it's stuck in `getRangeFetchMap();`

```
Multimap> rangeFetchMapMap = 
calculator.getRangeFetchMap();
logger.info("Output from RangeFetchMapCalculator for keyspace {}", keyspace);
```

Here is the cluster topology:
 * Cassandra version: 4.0.12
 * # of nodes: 190
 * Tokens (vnodes): 128

Initial hypothesis was that the graph calculation was taking longer due to the 
combination of nodes + tokens + tables but in the same cluster I see one of the 
node joined without any issues. 
wondering if I am hitting a bug causing it to  work sometimes but get into an 
infinite loop some times?
Please let me know if you need any other details and appreciate any pointers to 
debug this further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18922) cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586

2023-10-19 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1523#comment-1523
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18922:
---

Thanks, I am trying to identify the impact of CVE-2023-4586 on the cassandra 
server and if that affects 3.x and 4.x versions of cassandra. From 
[CASSANDRA-18812|https://issues.apache.org/jira/browse/CASSANDRA-18812?focusedCommentId=17760806&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17760806]
 I see that the server is impacted but doesn't have more details on why only 
the trunk and 5.0 are impacted and not 3.x and 4.x.

> cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586
> -
>
> Key: CASSANDRA-18922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18922
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP: https://nvd.nist.gov/vuln/detail/CVE-2023-4586
> but appears to be a false positive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18922) cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586

2023-10-19 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1490#comment-1490
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18922:
---

thanks [~brandon.williams] can you provide some more details as to why only 5.0 
is impacted?

 

Reading through the comments it looks like we need to set the hostname 
verification by default. Is this derived from the `server_encryption_option`: 
`require_endpoint_verification` 

> cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586
> -
>
> Key: CASSANDRA-18922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18922
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP: https://nvd.nist.gov/vuln/detail/CVE-2023-4586
> but appears to be a false positive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18922) cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586

2023-10-19 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1463#comment-1463
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18922:
---

[~brandon.williams] Thanks for reporting the vulnerability. I see the Fix 
version is marked as 5.x, do we have any timelines to back port the hostname 
validation to the 3.x or 4.x branches?

> cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586
> -
>
> Key: CASSANDRA-18922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18922
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP: https://nvd.nist.gov/vuln/detail/CVE-2023-4586
> but appears to be a false positive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18875) Upgrade the snakeyaml library version

2023-09-22 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768069#comment-17768069
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18875:
---

Thanks [~brandon.williams] the current version is 4.x which is in active use 
for most of the industry, can we update in the future 4.x release? I can send a 
PR for it.

> Upgrade the snakeyaml library version
> -
>
> Key: CASSANDRA-18875
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18875
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Fix For: 5.x
>
>
> Apache cassandra uses 1.26 version of snakeyaml dependency and there are 
> several 
> [vulnerabilities|https://mvnrepository.com/artifact/org.yaml/snakeyaml/1.26#] 
> in this version that can be fixed by upgrading to 2.x version. I understand 
> that this is not security issue as cassandra already uses SafeConstructor and 
> is not a vulnerability under OWASP, so there are no plans to fix it as per  
> CASSANDRA-18122
>  
> Cassandra as a open source used and distributed by many enterprise customers 
> and also when downloading cassandra as tar and using it external scanners are 
> not aware of the implementation of SafeConstructor have no idea if it's 
> vulnerable or not. 
> Can we consider upgrading the version to 2.x in the next releases as 
> snakeyaml is not something that has a large dependency between the major and 
> minor versions. I am happy to open a PR for this. Please let me know your 
> thoughts on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-18875) Upgrade the snakeyaml library version

2023-09-21 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-18875:
-

 Summary: Upgrade the snakeyaml library version
 Key: CASSANDRA-18875
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18875
 Project: Cassandra
  Issue Type: Task
Reporter: Jai Bheemsen Rao Dhanwada


Apache cassandra uses 1.26 version of snakeyaml dependency and there are 
several 
[vulnerabilities|https://mvnrepository.com/artifact/org.yaml/snakeyaml/1.26#] 
in this version that can be fixed by upgrading to 2.x version. I understand 
that this is not security issue as cassandra already uses SafeConstructor and 
is not a vulnerability under OWASP, so there are no plans to fix it as per  
CASSANDRA-18122

 

Cassandra as a open source used and distributed by many enterprise customers 
and also when downloading cassandra as tar and using it external scanners are 
not aware of the implementation of SafeConstructor have no idea if it's 
vulnerable or not. 

Can we consider upgrading the version to 2.x in the next releases as snakeyaml 
is not something that has a large dependency between the major and minor 
versions. I am happy to open a PR for this. Please let me know your thoughts on 
this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18122) when will slf4j be upgraded for CVE-2018-8088

2022-12-15 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648267#comment-17648267
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18122:
---

thanks [~brandon.williams] 

I believe it's because C* is using 1.7.25 version of slf4j the scanner detect 
the vulnerability: 
[https://github.com/apache/cassandra/blob/cassandra-4.1.0/build.xml#L534-L536]

are there any plans upgrade this version in the next release?

> when will slf4j be upgraded for CVE-2018-8088
> -
>
> Key: CASSANDRA-18122
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18122
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello Team,
> I see Cassandra 4.1 GA'ed on 12/13/2022 and still uses 1.7.25 version of 
> slf4j and the vulnerability: [https://nvd.nist.gov/vuln/detail/CVE-2018-8088] 
> is fixed only in the 1.7.26 version. Do we have any details on when the slf4j 
> be upgraded ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-18122) when will slf4j be upgraded for CVE-2018-8088

2022-12-15 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-18122:
-

 Summary: when will slf4j be upgraded for CVE-2018-8088
 Key: CASSANDRA-18122
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18122
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada


Hello Team,

I see Cassandra 4.1 GA'ed on 12/13/2022 and still uses 1.7.25 version of slf4j 
and the vulnerability: [https://nvd.nist.gov/vuln/detail/CVE-2018-8088] is 
fixed only in the 1.7.26 version. Do we have any details on when the slf4j be 
upgraded ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16670) Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest

2022-09-01 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599183#comment-17599183
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16670:
---

thanks for your response. Yes I figured out the issue by looking at the error 
message but my only confusion was why it started all of a sudden. The server 
was upgraded almost 10 days back but the error started only 2 days ago.

> Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest
> 
>
> Key: CASSANDRA-16670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16670
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc2, 4.0, 4.1-alpha1, 4.1
>
>
> *ViewComplexTest*
> Flaky 
> [test|https://ci-cassandra.apache.org/job/Cassandra-4.0/43/testReport/junit/org.apache.cassandra.cql3/ViewComplexTest/testPartialDeleteSelectedColumnWithoutFlush_3_/]
>  and move back away from 'long' section.
> *InsertUpdateIfConditionTest* (CASSANDRA-16676)
> Fails 
> [here|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/testListItem_2__clusterMinVersion_4_0_0_rc2_SNAPSHOT_/]
>  with a timeout. We can see in the history it takes quite a while in 
> [CI|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/history/]
>  _but_ it takes just 1m locally. Probably due to constrained resources. 
> Looking at the 
> [individual|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/]
>  test cases, for compression i.e., we can see 378 at an average of 1s each it 
> can easily go over the timeout of 240s. Recommendation is to either move to 
> 'long' section of to raise the timeout for the class for CI.
> *ViewFilteringTest*
> Move back from 'long' section



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16670) Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest

2022-08-31 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598748#comment-17598748
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16670:
---

[~e.dimitrova] Were you able to find why the errors that you mentioned in the 
[comment|https://issues.apache.org/jira/browse/CASSANDRA-16670?focusedCommentId=17355084&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17355084]
 happened? We see a similar errors where the driver version is 4.13.0 and the 
server version is 4.0.5. 

> Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest
> 
>
> Key: CASSANDRA-16670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16670
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc2, 4.0, 4.1-alpha1, 4.1
>
>
> *ViewComplexTest*
> Flaky 
> [test|https://ci-cassandra.apache.org/job/Cassandra-4.0/43/testReport/junit/org.apache.cassandra.cql3/ViewComplexTest/testPartialDeleteSelectedColumnWithoutFlush_3_/]
>  and move back away from 'long' section.
> *InsertUpdateIfConditionTest* (CASSANDRA-16676)
> Fails 
> [here|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/testListItem_2__clusterMinVersion_4_0_0_rc2_SNAPSHOT_/]
>  with a timeout. We can see in the history it takes quite a while in 
> [CI|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/history/]
>  _but_ it takes just 1m locally. Probably due to constrained resources. 
> Looking at the 
> [individual|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/]
>  test cases, for compression i.e., we can see 378 at an average of 1s each it 
> can easily go over the timeout of 240s. Recommendation is to either move to 
> 'long' section of to raise the timeout for the class for CI.
> *ViewFilteringTest*
> Move back from 'long' section



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

2022-08-24 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584057#comment-17584057
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-14715 at 8/24/22 4:15 PM:


[~stefan.miklosovic] do you have any estimate if this will be released anytime 
sooner? thank you


was (Author: jaid):
[~stefan.miklosovic] do you have any estimate if this will be released anything 
sooner? thank you

> Read repairs can result in bogus timeout errors to the client
> -
>
> Key: CASSANDRA-14715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14715
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Cameron Zemek
>Priority: Low
>
> In RepairMergeListener:close() it does the following:
>  
> {code:java}
> try
> {
> FBUtilities.waitOnFutures(repairResults, 
> DatabaseDescriptor.getWriteRpcTimeout());
> }
> catch (TimeoutException ex)
> {
> // We got all responses, but timed out while repairing
> int blockFor = consistency.blockFor(keyspace);
> if (Tracing.isTracing())
> Tracing.trace("Timed out while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> else
> logger.debug("Timeout while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
> }
> {code}
> This propagates up and gets sent to the client and we have customers get 
> confused cause they see timeouts for CL ALL requiring ALL replicas even 
> though they have read_repair_chance = 0 and using a LOCAL_* CL.
> At minimum I suggest instead of using the consistency level of DataResolver 
> (which is always ALL with read repairs) for the timeout it instead use 
> repairResults.size(). That is blockFor = repairResults.size() . But saying it 
> received _blockFor - 1_ is bogus still. Fixing that would require more 
> changes. I was thinking maybe like so:
>  
> {code:java}
> public static void waitOnFutures(List results, long ms, 
> MutableInt counter) throws TimeoutException
> {
> for (AsyncOneResponse result : results)
> {
> result.get(ms, TimeUnit.MILLISECONDS);
> counter.increment();
> }
> }
> {code}
>  
>  
>  
> Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says 
> _blockFor - 1_ for how many were received, which is also bogus.
>  
> Steps used to reproduce was modify RepairMergeListener:close() to always 
> throw timeout exception.  With schema:
> {noformat}
> CREATE KEYSPACE weather WITH replication = {'class': 
> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;
> CREATE TABLE weather.city (
> cityid int PRIMARY KEY,
> name text
> ) WITH bloom_filter_fp_chance = 0.01
> AND dclocal_read_repair_chance = 0.0
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {noformat}
> Then using the following steps:
>  # ccm node1 cqlsh
>  # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
>  # exit;
>  # ccm node1 flush
>  # ccm node1 stop
>  # rm -rf 
> ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-*
>  # remove the sstable with the insert
>  # ccm node1 start
>  # ccm node1 cqlsh
>  # CONSISTENCY LOCAL_QUORUM;
>  # select * from weather.city where cityid = 1;
> You get result of:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 5 
> responses." info={'received_responses': 5, 'required_responses': 6, 
> 'consistency': 'ALL'}{noformat}
> But was expecting:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 1 
> responses." info={'received_responses': 1, 'required_responses': 2, 
> 'consistency': 'LOCAL_QUORUM'}{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

2022-08-24 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584057#comment-17584057
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14715:
---

[~stefan.miklosovic] do you have any estimate if this will be released anything 
sooner? thank you

> Read repairs can result in bogus timeout errors to the client
> -
>
> Key: CASSANDRA-14715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14715
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Cameron Zemek
>Assignee: Stefan Miklosovic
>Priority: Low
>
> In RepairMergeListener:close() it does the following:
>  
> {code:java}
> try
> {
> FBUtilities.waitOnFutures(repairResults, 
> DatabaseDescriptor.getWriteRpcTimeout());
> }
> catch (TimeoutException ex)
> {
> // We got all responses, but timed out while repairing
> int blockFor = consistency.blockFor(keyspace);
> if (Tracing.isTracing())
> Tracing.trace("Timed out while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> else
> logger.debug("Timeout while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
> }
> {code}
> This propagates up and gets sent to the client and we have customers get 
> confused cause they see timeouts for CL ALL requiring ALL replicas even 
> though they have read_repair_chance = 0 and using a LOCAL_* CL.
> At minimum I suggest instead of using the consistency level of DataResolver 
> (which is always ALL with read repairs) for the timeout it instead use 
> repairResults.size(). That is blockFor = repairResults.size() . But saying it 
> received _blockFor - 1_ is bogus still. Fixing that would require more 
> changes. I was thinking maybe like so:
>  
> {code:java}
> public static void waitOnFutures(List results, long ms, 
> MutableInt counter) throws TimeoutException
> {
> for (AsyncOneResponse result : results)
> {
> result.get(ms, TimeUnit.MILLISECONDS);
> counter.increment();
> }
> }
> {code}
>  
>  
>  
> Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says 
> _blockFor - 1_ for how many were received, which is also bogus.
>  
> Steps used to reproduce was modify RepairMergeListener:close() to always 
> throw timeout exception.  With schema:
> {noformat}
> CREATE KEYSPACE weather WITH replication = {'class': 
> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;
> CREATE TABLE weather.city (
> cityid int PRIMARY KEY,
> name text
> ) WITH bloom_filter_fp_chance = 0.01
> AND dclocal_read_repair_chance = 0.0
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {noformat}
> Then using the following steps:
>  # ccm node1 cqlsh
>  # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
>  # exit;
>  # ccm node1 flush
>  # ccm node1 stop
>  # rm -rf 
> ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-*
>  # remove the sstable with the insert
>  # ccm node1 start
>  # ccm node1 cqlsh
>  # CONSISTENCY LOCAL_QUORUM;
>  # select * from weather.city where cityid = 1;
> You get result of:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 5 
> responses." info={'received_responses': 5, 'required_responses': 6, 
> 'consistency': 'ALL'}{noformat}
> But was expecting:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 1 
> responses." info={'received_responses': 1, 'required_responses': 2, 
> 'consistency': 'LOCAL_QUORUM'}{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

2022-05-25 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542210#comment-17542210
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14715:
---

Any plans to fix this in the upcoming versions or atleast 4.0.x version? the 
error message is quite mis-leading.

> Read repairs can result in bogus timeout errors to the client
> -
>
> Key: CASSANDRA-14715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14715
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Cameron Zemek
>Assignee: Stefan Miklosovic
>Priority: Low
>
> In RepairMergeListener:close() it does the following:
>  
> {code:java}
> try
> {
> FBUtilities.waitOnFutures(repairResults, 
> DatabaseDescriptor.getWriteRpcTimeout());
> }
> catch (TimeoutException ex)
> {
> // We got all responses, but timed out while repairing
> int blockFor = consistency.blockFor(keyspace);
> if (Tracing.isTracing())
> Tracing.trace("Timed out while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> else
> logger.debug("Timeout while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
> }
> {code}
> This propagates up and gets sent to the client and we have customers get 
> confused cause they see timeouts for CL ALL requiring ALL replicas even 
> though they have read_repair_chance = 0 and using a LOCAL_* CL.
> At minimum I suggest instead of using the consistency level of DataResolver 
> (which is always ALL with read repairs) for the timeout it instead use 
> repairResults.size(). That is blockFor = repairResults.size() . But saying it 
> received _blockFor - 1_ is bogus still. Fixing that would require more 
> changes. I was thinking maybe like so:
>  
> {code:java}
> public static void waitOnFutures(List results, long ms, 
> MutableInt counter) throws TimeoutException
> {
> for (AsyncOneResponse result : results)
> {
> result.get(ms, TimeUnit.MILLISECONDS);
> counter.increment();
> }
> }
> {code}
>  
>  
>  
> Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says 
> _blockFor - 1_ for how many were received, which is also bogus.
>  
> Steps used to reproduce was modify RepairMergeListener:close() to always 
> throw timeout exception.  With schema:
> {noformat}
> CREATE KEYSPACE weather WITH replication = {'class': 
> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;
> CREATE TABLE weather.city (
> cityid int PRIMARY KEY,
> name text
> ) WITH bloom_filter_fp_chance = 0.01
> AND dclocal_read_repair_chance = 0.0
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {noformat}
> Then using the following steps:
>  # ccm node1 cqlsh
>  # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
>  # exit;
>  # ccm node1 flush
>  # ccm node1 stop
>  # rm -rf 
> ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-*
>  # remove the sstable with the insert
>  # ccm node1 start
>  # ccm node1 cqlsh
>  # CONSISTENCY LOCAL_QUORUM;
>  # select * from weather.city where cityid = 1;
> You get result of:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 5 
> responses." info={'received_responses': 5, 'required_responses': 6, 
> 'consistency': 'ALL'}{noformat}
> But was expecting:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 1 
> responses." info={'received_responses': 1, 'required_responses': 2, 
> 'consistency': 'LOCAL_QUORUM'}{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16940) Confusing ProtocolException msg Invalid or unsupported protocol version (4)

2022-05-15 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537251#comment-17537251
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16940:
---

thanks [~brandon.williams] I checked all the comments and it looks like this 
entry with null version happens if Cassandra nodes are running with [separate 
IP addresses for listen and broadcast 
address.|https://issues.apache.org/jira/browse/CASSANDRA-16518?focusedCommentId=17425372&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17425372].
 In my case I am running my cassandra cluster in Kubernetes and I am only 
setting listen_address as the Pod IP address and broadcast address is same as 
listen. Not sure if I am running into 
[CASSANDRA-16518|https://issues.apache.org/jira/browse/CASSANDRA-16518].

> Confusing ProtocolException msg Invalid or unsupported protocol version (4)
> ---
>
> Key: CASSANDRA-16940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Brad Schoening
>Priority: Normal
>
> The following warning was seen frequently after upgrading from 3.0.15 to 
> 3.11.11 in the cassandra.log:
> {noformat}
> ProtocolException: Invalid or unsupported protocol version (4); supported 
> versions are (3/v3, 4/v4, 5/v5-beta){noformat}
> It is at best unclear, or maybe a bug in the code throwing this exception 
> stating version '4' not supported but 4/v4 is.
> from org/apache/cassandra/transport/ProtocolVersion.java
> public static String invalidVersionMessage(int version)
> { return String.format("Invalid or unsupported protocol version (%d); 
> supported versions are (%s)", version, String.join(", ", 
> ProtocolVersion.supportedVersions())); }
> We later found invalid IP addresses in the system.peers table and once 
> removed, this exception went away.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16940) Confusing ProtocolException msg Invalid or unsupported protocol version (4)

2022-05-13 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536804#comment-17536804
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16940 at 5/13/22 6:07 PM:


I had a similar issue and when I started looking at code to see why it is 
happening.

 
{code:java}
boolean enforceV3Cap = SystemKeyspace.loadPeerVersions()
                                         .values()
                                         .stream()
                                         .anyMatch(v -> 
v.compareTo(MIN_VERSION_FOR_V4) < 0);
{code}
Inspecting the system.peers table, one of the node in the cluster has a null 
entry, which is causing the specific node to cap the max negotiable version to 
V3.

 
{code:java}
> select peer, release_version from system.peers;
 peer          | release_version
--+
  10.41.128.35 |          3.11.9
 10.41.128.228 |            null
  10.41.128.99 |          3.11.9
(3 rows)
 
{code}
 

However, I am not sure why there is a null entry in the peers table. Also I 
checked the 
{noformat}
nodetool status{noformat}
 and 
{noformat}
nodetool describecluster{noformat}
 and I don't see this specific IP present. Not sure if there is a bug that is 
causing this?


was (Author: jaid):
I had a similar issue and when I started looking at code to see why it is 
happening.

```

boolean enforceV3Cap = SystemKeyspace.loadPeerVersions()
                                         .values()
                                         .stream()
                                         .anyMatch(v -> 
v.compareTo(MIN_VERSION_FOR_V4) < 0);

```

 

Inspecting the system.peers table, one of the node in the cluster has a null 
entry, which is causing the specific node to cap the max negotiable version to 
V3.

 

```

> select peer, release_version from system.peers;

 peer          | release_version
---+-
  10.41.128.35 |          3.11.9
 10.41.128.228 |            null
  10.41.128.99 |          3.11.9

(3 rows)

```

 

However, I am not sure why there is a null entry in the peers table. Also I 
checked the `nodetool status` and `nodetool describecluster` and I don't see 
this specific IP present. Not sure if there is a bug that is causing this?

> Confusing ProtocolException msg Invalid or unsupported protocol version (4)
> ---
>
> Key: CASSANDRA-16940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Brad Schoening
>Priority: Normal
>
> The following warning was seen frequently after upgrading from 3.0.15 to 
> 3.11.11 in the cassandra.log:
> {noformat}
> ProtocolException: Invalid or unsupported protocol version (4); supported 
> versions are (3/v3, 4/v4, 5/v5-beta){noformat}
> It is at best unclear, or maybe a bug in the code throwing this exception 
> stating version '4' not supported but 4/v4 is.
> from org/apache/cassandra/transport/ProtocolVersion.java
> public static String invalidVersionMessage(int version)
> { return String.format("Invalid or unsupported protocol version (%d); 
> supported versions are (%s)", version, String.join(", ", 
> ProtocolVersion.supportedVersions())); }
> We later found invalid IP addresses in the system.peers table and once 
> removed, this exception went away.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16940) Confusing ProtocolException msg Invalid or unsupported protocol version (4)

2022-05-13 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536804#comment-17536804
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16940:
---

I had a similar issue and when I started looking at code to see why it is 
happening.

```

boolean enforceV3Cap = SystemKeyspace.loadPeerVersions()
                                         .values()
                                         .stream()
                                         .anyMatch(v -> 
v.compareTo(MIN_VERSION_FOR_V4) < 0);

```

 

Inspecting the system.peers table, one of the node in the cluster has a null 
entry, which is causing the specific node to cap the max negotiable version to 
V3.

 

```

> select peer, release_version from system.peers;

 peer          | release_version
---+-
  10.41.128.35 |          3.11.9
 10.41.128.228 |            null
  10.41.128.99 |          3.11.9

(3 rows)

```

 

However, I am not sure why there is a null entry in the peers table. Also I 
checked the `nodetool status` and `nodetool describecluster` and I don't see 
this specific IP present. Not sure if there is a bug that is causing this?

> Confusing ProtocolException msg Invalid or unsupported protocol version (4)
> ---
>
> Key: CASSANDRA-16940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Brad Schoening
>Priority: Normal
>
> The following warning was seen frequently after upgrading from 3.0.15 to 
> 3.11.11 in the cassandra.log:
> {noformat}
> ProtocolException: Invalid or unsupported protocol version (4); supported 
> versions are (3/v3, 4/v4, 5/v5-beta){noformat}
> It is at best unclear, or maybe a bug in the code throwing this exception 
> stating version '4' not supported but 4/v4 is.
> from org/apache/cassandra/transport/ProtocolVersion.java
> public static String invalidVersionMessage(int version)
> { return String.format("Invalid or unsupported protocol version (%d); 
> supported versions are (%s)", version, String.join(", ", 
> ProtocolVersion.supportedVersions())); }
> We later found invalid IP addresses in the system.peers table and once 
> removed, this exception went away.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception

2022-03-28 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:08 PM:
-

I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is {*}SSTable Index which maps row keys to 
their respective offsets in the Data file{*}, but does this holds the actual 
Row Key and since the compaction is failing, even the tombstoned rows are not 
getting cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?


was (Author: jaid):
I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is SSTable Index which maps row keys to their 
respective offsets in the Data file, but does this holds the actual Row Key and 
since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?

> Compaction repeatedly fails validateReallocation exception
> --
>
> Key: CASSANDRA-16764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: Richard Hesse
>Priority: Normal
>
> I have a few nodes in my ring that are stuck repeatedly trying to compact the 
> same tables over and over again. I've run through the usual trick of rolling 
> restarts, and it doesn't seem to help. This exception is logged on the nodes:
> {code}
> ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 
> - Exception in thread Thread[CompactionExecutor:6,1,main]
> java.lang.RuntimeException: null
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173)

[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception

2022-03-28 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:08 PM:
-

I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is SSTable Index which maps row keys to their 
respective offsets in the Data file, but does this holds the actual Row Key and 
since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?


was (Author: jaid):
I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is SSTable Index which maps row keys to their 
respective offsets in the Data file, but does this holds the actual Row Key and 
since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?

> Compaction repeatedly fails validateReallocation exception
> --
>
> Key: CASSANDRA-16764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: Richard Hesse
>Priority: Normal
>
> I have a few nodes in my ring that are stuck repeatedly trying to compact the 
> same tables over and over again. I've run through the usual trick of rolling 
> restarts, and it doesn't seem to help. This exception is logged on the nodes:
> {code}
> ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 
> - Exception in thread Thread[CompactionExecutor:6,1,main]
> java.lang.RuntimeException: null
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173)
>  ~[a

[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception

2022-03-28 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:08 PM:
-

I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is SSTable Index which maps row keys to their 
respective offsets in the Data file, but does this holds the actual Row Key and 
since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?


was (Author: jaid):
I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is SSTable Index which maps row keys to their 
respective offsets in the Data file, but does this holds the actual Row Key and 
since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?

> Compaction repeatedly fails validateReallocation exception
> --
>
> Key: CASSANDRA-16764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: Richard Hesse
>Priority: Normal
>
> I have a few nodes in my ring that are stuck repeatedly trying to compact the 
> same tables over and over again. I've run through the usual trick of rolling 
> restarts, and it doesn't seem to help. This exception is logged on the nodes:
> {code}
> ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 
> - Exception in thread Thread[CompactionExecutor:6,1,main]
> java.lang.RuntimeException: null
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173)
>  ~[a

[jira] [Commented] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception

2022-03-28 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16764:
---

I am running into this issue as well and have couple of questions.
 # I understand that Index file is `SSTable Index which maps row keys to their 
respective offsets in the Data file`, but does this holds the actual Row Key 
and since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?

> Compaction repeatedly fails validateReallocation exception
> --
>
> Key: CASSANDRA-16764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: Richard Hesse
>Priority: Normal
>
> I have a few nodes in my ring that are stuck repeatedly trying to compact the 
> same tables over and over again. I've run through the usual trick of rolling 
> restarts, and it doesn't seem to help. This exception is logged on the nodes:
> {code}
> ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 
> - Exception in thread Thread[CompactionExecutor:6,1,main]
> java.lang.RuntimeException: null
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:136)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:98)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:143)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:204)
>  ~[apa

[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception

2022-03-28 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:07 PM:
-

I am running into this issue as well and have couple of questions.
 # {{I understand that Index file is SSTable Index which maps row keys to their 
respective offsets in the Data file, but does this holds the actual Row Key and 
since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?}}
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?


was (Author: jaid):
I am running into this issue as well and have couple of questions.
 # I understand that Index file is `SSTable Index which maps row keys to their 
respective offsets in the Data file`, but does this holds the actual Row Key 
and since the compaction is failing, even the tombstoned rows are not getting 
cleared from the Index files and the actual data files (Data.db) ?
 # Is there a configuration/limit where we can increase this limit from 2GB to 
a higher value?
 # In worst case, what is the harm or loss when we delete these files?

> Compaction repeatedly fails validateReallocation exception
> --
>
> Key: CASSANDRA-16764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: Richard Hesse
>Priority: Normal
>
> I have a few nodes in my ring that are stuck repeatedly trying to compact the 
> same tables over and over again. I've run through the usual trick of rolling 
> restarts, and it doesn't seem to help. This exception is logged on the nodes:
> {code}
> ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 
> - Exception in thread Thread[CompactionExecutor:6,1,main]
> java.lang.RuntimeException: null
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173)
>  ~[apa

[jira] [Updated] (CASSANDRA-17355) Performance degradation when the data size grows

2022-02-08 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-17355:
--
Summary: Performance degradation when the data size grows  (was: 
Performance degradation with Counter tables when the data size grows)

> Performance degradation when the data size grows
> 
>
> Key: CASSANDRA-17355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17355
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello Everyone, 
> I am noticing a huge perf drop (spike in latency and CPU utilization) for the 
> counter type tables when the data size grows. To better understand/simulate I 
> have done the following perf test with `cassandra-stress` instead of my 
> use-case and I can reproduce the performance issue consistently. When using 
> the counter type tables, when the datasize grows the read latency and cpu 
> spikes to very high number.
>  
> *Test Setup:*
>  # Setup a cluster with 3 nodes.
>  # Run a test with cassandra-stress and I see the latency and CPU are okay 
> without much spike.
>  # Send a lot of counter traffic using `cassandra-stress` tool (Replication 
> Factory = 3)
>  # Now the data size on the cluster is ~300G. 
>  # Now run another test with cassandra-stress with 3:1 read write mixed 
> workload.
>  # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the 
> latency reaches ~1 seconds (which earlier was < 5ms).
>  # Another interesting observation is the disk reads goes to a higher number 
> and it keeps going higher with the increase in the disk size. 
>  # It pretty much looked like a disk bottleneck issue but the same result 
> shows very low disk reads, cpu, latency with less amount of data.
>  # Below is the configuration I have used for testing this.
>  
> {quote}C* Version: 3.11.9
> CPU: 16
> Memory: 64G
> Heap: 16G
> GC: G1GC
> Disk: 500G GCP Persistent disk 
>  
> {quote}
> I understand that, with growth in disk the number of lookup grows high, but 
> this looked to be a big performance drop.
> Please let me know if you need more details. Also let me know this is known 
> limitation with the counter type and if there is a work around. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17355) Performance degradation with Counter tables when the data size grows

2022-02-07 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488484#comment-17488484
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-17355:
---

I will reach out to the community user group but the issue i see here is the 
performance drops as soon as the data size grows. Is that expected?

> Performance degradation with Counter tables when the data size grows
> 
>
> Key: CASSANDRA-17355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17355
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello Everyone, 
> I am noticing a huge perf drop (spike in latency and CPU utilization) for the 
> counter type tables when the data size grows. To better understand/simulate I 
> have done the following perf test with `cassandra-stress` instead of my 
> use-case and I can reproduce the performance issue consistently. When using 
> the counter type tables, when the datasize grows the read latency and cpu 
> spikes to very high number.
>  
> *Test Setup:*
>  # Setup a cluster with 3 nodes.
>  # Run a test with cassandra-stress and I see the latency and CPU are okay 
> without much spike.
>  # Send a lot of counter traffic using `cassandra-stress` tool (Replication 
> Factory = 3)
>  # Now the data size on the cluster is ~300G. 
>  # Now run another test with cassandra-stress with 3:1 read write mixed 
> workload.
>  # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the 
> latency reaches ~1 seconds (which earlier was < 5ms).
>  # Another interesting observation is the disk reads goes to a higher number 
> and it keeps going higher with the increase in the disk size. 
>  # It pretty much looked like a disk bottleneck issue but the same result 
> shows very low disk reads, cpu, latency with less amount of data.
>  # Below is the configuration I have used for testing this.
>  
> {quote}C* Version: 3.11.9
> CPU: 16
> Memory: 64G
> Heap: 16G
> GC: G1GC
> Disk: 500G GCP Persistent disk 
>  
> {quote}
> I understand that, with growth in disk the number of lookup grows high, but 
> this looked to be a big performance drop.
> Please let me know if you need more details. Also let me know this is known 
> limitation with the counter type and if there is a work around. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17355) Performance degradation with Counter tables when the data size grows

2022-02-07 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-17355:
--
Description: 
Hello Everyone, 

I am noticing a huge perf drop (spike in latency and CPU utilization) for the 
counter type tables when the data size grows. To better understand/simulate I 
have done the following perf test with `cassandra-stress` instead of my 
use-case and I can reproduce the performance issue consistently. When using the 
counter type tables, when the datasize grows the read latency and cpu spikes to 
very high number.

 

*Test Setup:*
 # Setup a cluster with 3 nodes.
 # Run a test with cassandra-stress and I see the latency and CPU are okay 
without much spike.
 # Send a lot of counter traffic using `cassandra-stress` tool (Replication 
Factory = 3)
 # Now the data size on the cluster is ~300G. 
 # Now run another test with cassandra-stress with 3:1 read write mixed 
workload.
 # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the 
latency reaches ~1 seconds (which earlier was < 5ms).
 # Another interesting observation is the disk reads goes to a higher number 
and it keeps going higher with the increase in the disk size. 
 # It pretty much looked like a disk bottleneck issue but the same result shows 
very low disk reads, cpu, latency with less amount of data.
 # Below is the configuration I have used for testing this.

 
{quote}C* Version: 3.11.9

CPU: 16

Memory: 64G

Heap: 16G

GC: G1GC

Disk: 500G GCP Persistent disk 

 
{quote}
I understand that, with growth in disk the number of lookup grows high, but 
this looked to be a big performance drop.

Please let me know if you need more details. Also let me know this is known 
limitation with the counter type and if there is a work around. 

  was:
Hello Everyone, 

I am noticing a huge perf drop (spike in latency and CPU utilization) for the 
counter type tables when the data size grows. To better understand/simulate I 
have done the following perf test with `cassandra-stress` instead of my 
use-case and I can reproduce the performance issue consistently. When using the 
counter type tables, when the datasize grows the read latency and cpu spikes to 
very high number.

 

*Test Setup:*
 # Setup a cluster with 3 nodes.
 # Run a test with cassandra-stress and I see the latency and CPU are okay 
without much spike.
 # Send a lot of counter traffic using `cassandra-stress` tool (Replication 
Factory = 3)
 # Now the data size on the cluster is ~300G. 
 # Now run another test with cassandra-stress with 3:1 read write mixed 
workload.
 # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the 
latency reaches ~1 seconds (which earlier was < 5ms).
 # Another interesting observation is the disk reads goes to a higher number 
and it keeps going higher with the increase in the disk size. 
 # It pretty much looked like a disk bottleneck issue but the same result shows 
very low disk reads, cpu, latency with less amount of data.
 # Below is the configuration I have used for testing this.

```

C* Version: 3.11.9

CPU: 16

Memory: 64G

Heap: 16G

GC: G1GC

Disk: 500G GCP Persistent disk 

``` 

I understand that, with growth in disk the number of lookup grows high, but 
this looked to be a big performance drop.

 

Please let me know if you need more details. Also let me know this is known 
limitation with the counter type and if there is a work around. 


> Performance degradation with Counter tables when the data size grows
> 
>
> Key: CASSANDRA-17355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17355
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello Everyone, 
> I am noticing a huge perf drop (spike in latency and CPU utilization) for the 
> counter type tables when the data size grows. To better understand/simulate I 
> have done the following perf test with `cassandra-stress` instead of my 
> use-case and I can reproduce the performance issue consistently. When using 
> the counter type tables, when the datasize grows the read latency and cpu 
> spikes to very high number.
>  
> *Test Setup:*
>  # Setup a cluster with 3 nodes.
>  # Run a test with cassandra-stress and I see the latency and CPU are okay 
> without much spike.
>  # Send a lot of counter traffic using `cassandra-stress` tool (Replication 
> Factory = 3)
>  # Now the data size on the cluster is ~300G. 
>  # Now run another test with cassandra-stress with 3:1 read write mixed 
> workload.
>  # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the 
> latency reaches ~1 seconds (which earlier was < 5ms).
>  # Another interesting observation is the disk reads goes to a higher numbe

[jira] [Created] (CASSANDRA-17355) Performance degradation with Counter tables when the data size grows

2022-02-07 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-17355:
-

 Summary: Performance degradation with Counter tables when the data 
size grows
 Key: CASSANDRA-17355
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17355
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada


Hello Everyone, 

I am noticing a huge perf drop (spike in latency and CPU utilization) for the 
counter type tables when the data size grows. To better understand/simulate I 
have done the following perf test with `cassandra-stress` instead of my 
use-case and I can reproduce the performance issue consistently. When using the 
counter type tables, when the datasize grows the read latency and cpu spikes to 
very high number.

 

*Test Setup:*
 # Setup a cluster with 3 nodes.
 # Run a test with cassandra-stress and I see the latency and CPU are okay 
without much spike.
 # Send a lot of counter traffic using `cassandra-stress` tool (Replication 
Factory = 3)
 # Now the data size on the cluster is ~300G. 
 # Now run another test with cassandra-stress with 3:1 read write mixed 
workload.
 # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the 
latency reaches ~1 seconds (which earlier was < 5ms).
 # Another interesting observation is the disk reads goes to a higher number 
and it keeps going higher with the increase in the disk size. 
 # It pretty much looked like a disk bottleneck issue but the same result shows 
very low disk reads, cpu, latency with less amount of data.
 # Below is the configuration I have used for testing this.

```

C* Version: 3.11.9

CPU: 16

Memory: 64G

Heap: 16G

GC: G1GC

Disk: 500G GCP Persistent disk 

``` 

I understand that, with growth in disk the number of lookup grows high, but 
this looked to be a big performance drop.

 

Please let me know if you need more details. Also let me know this is known 
limitation with the counter type and if there is a work around. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14328) Invalid metadata has been detected for role

2022-02-03 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486617#comment-17486617
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14328:
---

[~tania.en...@quest.com] [~prnvjndl] were you able to figure out why the node 
got into this state? it looks like this is a side effect of bootstrapping 
process not streaming all the data 
(https://issues.apache.org/jira/browse/CASSANDRA-14006). I see a similar issue 
and just wondering if you have any more details and workaround on this issue?

> Invalid metadata has been detected for role
> ---
>
> Key: CASSANDRA-14328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Pranav Jindal
>Priority: Normal
>
> Cassandra Version : 3.10
> One node was replaced and was successfully up and working but CQL-SH fails 
> with error.
>  
> CQL-SH error:
>  
> {code:java}
> Connection error: ('Unable to connect to any servers', {'10.180.0.150': 
> AuthenticationFailed('Failed to authenticate to 10.180.0.150: Error from 
> server: code= [Server error] message="java.lang.RuntimeException: Invalid 
> metadata has been detected for role utorjwcnruzzlzafxffgyqmlvkxiqcgb"',)})
> {code}
>  
> Cassandra server ERROR:
> {code:java}
> WARN [Native-Transport-Requests-1] 2018-03-20 13:37:17,894 
> CassandraRoleManager.java:96 - An invalid value has been detected in the 
> roles table for role utorjwcnruzzlzafxffgyqmlvkxiqcgb. If you are unable to 
> login, you may need to disable authentication and confirm that values in that 
> table are accurate
> ERROR [Native-Transport-Requests-1] 2018-03-20 13:37:17,895 Message.java:623 
> - Unexpected exception during request; channel = [id: 0xdfc3604f, 
> L:/10.180.0.150:9042 - R:/10.180.0.150:51668]
> java.lang.RuntimeException: Invalid metadata has been detected for role 
> utorjwcnruzzlzafxffgyqmlvkxiqcgb
> at 
> org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:99)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:82)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:528)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:503)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager.canLogin(CassandraRoleManager.java:310)
>  ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.service.ClientState.login(ClientState.java:271) 
> ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:80)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
>  [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
>  [apache-cassandra-3.10.jar:3.10]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.10.jar:3.10]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.cassandra.cql3.UntypedResultSet$Row.getBoolean(UntypedResultSet.java:273)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:88)
>  ~[apache-cassandra-3.10.jar:3.10]
> ... 16 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15155) Bootstrapping node finishes 'successfully' before schema synced, data not streamed

2022-02-02 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486218#comment-17486218
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15155:
---

[~cowardlydragon] just curious, were able to narrow down this issue to see why 
it happened?

> Bootstrapping node finishes 'successfully' before schema synced, data not 
> streamed
> --
>
> Key: CASSANDRA-15155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Consistency/Bootstrap and 
> Decommission
>Reporter: Constance Eustace
>Priority: Normal
> Attachments: debug.log.zip
>
>
> Bootstrap of a new node to expand an existing cluster is completing the 
> bootstrapping "successfully", joining the cluster as UN in nodetool status, 
> but has no data and no active streams. Writes and reads start being served. 
> Environment: AWS, IPV6, three datacenters, asia / europe / us
> Version: 2.2.13
> We have previously scaled the europe and us datacenters from 5 nodes to 25 
> nodes (one node at a time) without incident.
> Asia (tokyo) is a different story. We have seen multiple failure scenarios, 
> but the most troubling is a string of attempted node bootstrappings where the 
> bootstrap completes and the node joins the ring, but there is no data. 
> We were able to expand Asia by four nodes by increasing ring delay to 100 
> seconds, but that has not worked anymore. 
> Attached Log: Our autoscaling + Ansible initial setup creates the node, but 
> the ansible has not run yet, so the autostarted cassandra fails to load, but 
> it has no security group yet so it did not communicate with any other node. 
> That is the 15:15:XX series log messages at the very top.
> Then 15:20:XX series messages begin after ansible has completed setup of the 
> box, and the data dirs and commit log dirs have been scrubbed. 
> This same process ran for EU and US expansions without incident. 
> From what I can tell of the log (DEBUG was enabled):
> Ring information collection begins, so some sort of gossip/cluster 
> communication is healthy:
> INFO [main] 2019-06-12 15:20:05,468 StorageService.java:1142 - JOINING: 
> waiting for ring information
> But almost all of those checks output:
> DEBUG [GossipStage:1] 2019-06-12 15:20:07,673 MigrationManager.java:96 - Not 
> pulling schema because versions match or shouldPullSchemaFrom returned false
> Which seems weird, as we shall see soon. After all the nodes have reported in 
> a similar way, most not pulling because of the above message, and a couple 
> that were interpreted as DOWN, it then does:
> INFO [main] 2019-06-12 15:21:45,486 StorageService.java:1142 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2019-06-12 15:21:45,487 StorageService.java:1142 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2019-06-12 15:21:45,487 StorageService.java:1142 - JOINING: 
> calculation complete, ready to bootstrap
> We then get a huge number of
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
> cfId=dd5d7fa0-1e50-11e9-a62d-0d41c97b2404
> INFO [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING: 
> Starting to bootstrap...
> INFO [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream 
> #05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for Bootstrap
> INFO [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap 
> completed! for the tokens [-7314981925085449175, .. 
> 5499447097629838103]
> Here are the log messages for MIgrationManager for schema gossiping:
> DEBUG [main] 2019-06-12 15:20:05,423 MigrationManager.java:493 - Gossiping my 
> schema version 59adb24e-f3cd-3e02-97f0-5b395827453f
> DEBUG [MigrationStage:1] 2019-06-12 15:23:25,694 MigrationManager.java:493 - 
> Gossiping my schema version 3d1a9d9e-1120-37ae-abe0-e064cd147a99
> DEBUG [MigrationStage:1] 2019-06-12 15:23:25,775 MigrationManager.java:493 - 
> Gossiping my schema version 0bf74f5a-4b39-3525-b217-e9ccf7a1b6cb
> DEBUG [MigrationStage:1] 2019-06-12 15:23:25,905 MigrationManager.java:493 - 
> Gossiping my schema version b145475a-02dc-370c-8af7-a9aba2d61362
> DEBUG [InternalResponseStage:12] 2019-06-12 15:24:26,445 
> MigrationManager.java:493 - Gossiping my schema version 
> 9c2ed14a-8db5-39b3-af48-6cdb5463c772
> the schema version ending in -6cdb5463c772 is the proper version in the other 
> nodes per gossipinfo. But as can be seen, the bootstrap completion message 
> (15:23:25,526) is logged before four or five intermediate schema versions are 
> created, which seem to be due to system_distributed and other keyspaces being 
> created.
> The Bootstrap completed! message comes from

[jira] [Commented] (CASSANDRA-16617) DOC - Publish drivers compatibility matrix

2021-07-30 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390838#comment-17390838
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16617:
---

Thank you [~flightc] I was mainly interested in updating the document for 
Cassandra 4.0 and Java driver Versions: 
[https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html] 
Now that 4.0 is officially GA :)

> DOC - Publish drivers compatibility matrix
> --
>
> Key: CASSANDRA-16617
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16617
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: Erick Ramirez
>Assignee: Erick Ramirez
>Priority: Normal
>
> [~jaid] brought up [on the User 
> ML|https://lists.apache.org/thread.html/r41ab9448a8af2e95996577c82bd5a9ca09e308bc79917b15fad45580%40%3Cuser.cassandra.apache.org%3E]
>  whether a compatibility matrix exists on the Apache Cassandra website 
> similar to [the Java driver matrix published on the DataStax Docs 
> website|https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html].
> I've logged this ticket to consider including the compatibility matrix from 
> the following _supported_ drivers:
>  * Java driver
>  * Python driver
>  * Node.js driver
>  * C# driver driver
>  * C++ driver
> My intention is to socialise this idea with the devs and provided there are 
> no objections, I'll look into implementing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2021-06-16 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364421#comment-17364421
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874:
---

Thanks you [~brandon.williams] unfortunately, I am failing to reproduce this 
issue. so I am trying to see if there are any open/known issues that could 
cause this. Since I moved from 3.11.6 to 3.11.9 and it still happend, so not 
sure if there will be any success with 3.11.10. Do you have any recommendations 
or pointers to identify the issue?

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2021-06-16 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364387#comment-17364387
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874:
---

[~brandon.williams] I have noticed the same behavior even on the 3.11.9 version 
of Cassandra. Is there any other race condition that could cause this issue?

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16465) Increased Read Latency With Cassandra >= 3.11.7

2021-04-26 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332621#comment-17332621
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16465:
---

Thank you [~nvest] [~AhmedElJAMI]

 

I tried upgrading from 3.11.6 to 3.11.9 and didn't see any perf drop with my 
application use-case and also the cassandra-stress too. I use LCS in all my 
tables (system* tables uses default shipped with Cassandra). Do you mind 
sharing your test scenarios where you see latency, so I can make sure I am not 
missing anything. 

 

[~nvest] please share your finding with 3.11.10 as well, so it will be useful 
for me to assess which version to upgrade.

> Increased Read Latency With Cassandra >= 3.11.7
> ---
>
> Key: CASSANDRA-16465
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16465
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Ahmed ELJAMI
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
> Attachments: Screenshot from 2021-04-24 20-50-47.png
>
>
> After upgrading Cassandra from 3.11.3 to 3.11.9, Cassandra read latency 99% 
> increased significantly. Getting back to 3.11.3 immediately fixed the issue.
> I have observed "SStable reads" increases after upgrading to 3.11.9.
> The same behavior was observed by some other users: 
> [https://www.mail-archive.com/user@cassandra.apache.org/msg61247.html]
> According to Paulo Motta's comment, this behavior may be caused by 
> https://issues.apache.org/jira/browse/CASSANDRA-15690 which was introduced on 
> 3.11.7 and removed an optimization that may cause a correctness issue when 
> there are partition deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16465) Increased Read Latency With Cassandra >= 3.11.7

2021-04-23 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331078#comment-17331078
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16465:
---

[~nvest] I am planning to upgrade the Cassandra version from 3.11.6 to 3.11.9 
and I don't have any tables with TWCS. have you noticed this performance drop 
with all the Compaction Strategies or just TWCS?

> Increased Read Latency With Cassandra >= 3.11.7
> ---
>
> Key: CASSANDRA-16465
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16465
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Ahmed ELJAMI
>Priority: Normal
> Fix For: 3.11.11
>
>
> After upgrading Cassandra from 3.11.3 to 3.11.9, Cassandra read latency 99% 
> increased significantly. Getting back to 3.11.3 immediately fixed the issue.
> I have observed "SStable reads" increases after upgrading to 3.11.9.
> The same behavior was observed by some other users: 
> [https://www.mail-archive.com/user@cassandra.apache.org/msg61247.html]
> According to Paulo Motta's comment, this behavior may be caused by 
> https://issues.apache.org/jira/browse/CASSANDRA-15690 which was introduced on 
> 3.11.7 and removed an optimization that may cause a correctness issue when 
> there are partition deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-16628) Logging bug during the node replacement and token assignment

2021-04-23 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-16628:
-

 Summary: Logging bug during the node replacement and token 
assignment
 Key: CASSANDRA-16628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16628
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada


Hello Team,

 

I noticed a minor logging issue when a Cassandra node is trying to boot-up with 
a new IP address but the existing data directory. The IP address and Token 
fields are inter-changed.

 

*Sample Log:* 

{{WARN [GossipStage:1] 2021-04-23 18:24:06,348 StorageService.java:2425 - Not 
updating host ID 27031833-5141-46e0-b032-bef67137ae49 for /10.24.3.9 because 
it's mine}}
{{INFO [GossipStage:1] 2021-04-23 18:24:06,349 StorageService.java:2356 - Nodes 
() and /10.24.3.9 have the same token /10.24.3.10. Ignoring 
-1124147225848710462}}
{{INFO [GossipStage:1] 2021-04-23 18:24:06,350 StorageService.java:2356 - Nodes 
() and /10.24.3.9 have the same token /10.24.3.10. Ignoring 
-1239985462983206335}}

*Steps to Reproduce:*

Replace a Cassandra node with the a new IP address with the same data directory 
and the logs should show the messages.

*Cassandra Version*: 3.11.6

 

Please let me know if you need more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15592) IllegalStateException in gossip after removing node

2020-07-06 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152387#comment-17152387
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15592:
---

Hello [~brandon.williams]

I ran into the similar Exception, is there any impact of this ERROR or this is 
just more of logging problem? in my tests I didn't see any impact to the 
cluster operations. so I would like to know the impact of this before even 
attempting to upgrade in production 

 

> IllegalStateException in gossip after removing node
> ---
>
> Key: CASSANDRA-15592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
>Priority: Normal
> Fix For: 3.0.21, 3.11.7, 4.0, 4.0-alpha4
>
>
> In one of our test environments we encountered the following exception:
> {noformat}
> 2020-02-02T10:50:13.276+0100 [GossipTasks:1] ERROR 
> o.a.c.u.NoSpamLogger$NoSpamLogStatement:97 log 
> java.lang.IllegalStateException: Attempting gossip state mutation from 
> illegal thread: GossipTasks:1
>  at 
> org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178)
>  at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465)
>  at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895)
>  at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78)
>  at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240)
>  at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.lang.Thread.run(Thread.java:748)
> java.lang.IllegalStateException: Attempting gossip state mutation from 
> illegal thread: GossipTasks:1
>  at 
> org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178)
>  [apache-cassandra-3.11.5.jar:3.11.5]
>  at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465) 
> [apache-cassandra-3.11.5.jar:3.11.5]
>  at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895) 
> [apache-cassandra-3.11.5.jar:3.11.5]
>  at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78) 
> [apache-cassandra-3.11.5.jar:3.11.5]
>  at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240) 
> [apache-cassandra-3.11.5.jar:3.11.5]
>  at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  [apache-cassandra-3.11.5.jar:3.11.5]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_231]
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_231]
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_231]
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_231]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_231]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_231]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>  [apache-cassandra-3.11.5.jar:3.11.5]
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
>  at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_231]
> {noformat}
> Since CASSANDRA-15059 we check that all state changes are performed in the 
> GossipStage but it seems like it was still performed in the "current" thread 
> [here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/gms/Gossiper.java#L895].
>  It should be as simp

[jira] [Issue Comment Deleted] (CASSANDRA-8675) COPY TO/FROM broken for newline characters

2020-07-06 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-8675:
-
Comment: was deleted

(was: I tried the patch, but still running into the issue where if I look at 
the data with cqlsh I see  a yellow '\n' after the import (literal) instead of  
purple '\n'  (control character) )

> COPY TO/FROM broken for newline characters
> --
>
> Key: CASSANDRA-8675
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8675
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native 
> protocol v3]
> Ubuntu 14.04 64-bit
>Reporter: Lex Lythius
>Priority: Normal
>  Labels: cqlsh, remove-reopen
> Fix For: 3.0.x
>
> Attachments: CASSANDRA-8675.patch, copytest.csv
>
>
> Exporting/importing does not preserve contents when texts containing newline 
> (and possibly other) characters are involved:
> {code:sql}
> cqlsh:test> create table if not exists copytest (id int primary key, t text);
> cqlsh:test> insert into copytest (id, t) values (1, 'This has a newline
> ... character');
> cqlsh:test> insert into copytest (id, t) values (2, 'This has a quote " 
> character');
> cqlsh:test> insert into copytest (id, t) values (3, 'This has a fake tab \t 
> character (typed backslash, t)');
> cqlsh:test> select * from copytest;
>  id | t
> +-
>   1 |   This has a newline\ncharacter
>   2 |This has a quote " character
>   3 | This has a fake tab \t character (entered slash-t text)
> (3 rows)
> cqlsh:test> copy copytest to '/tmp/copytest.csv';
> 3 rows exported in 0.034 seconds.
> cqlsh:test> copy copytest from '/tmp/copytest.csv';
> 3 rows imported in 0.005 seconds.
> cqlsh:test> select * from copytest;
>  id | t
> +---
>   1 |  This has a newlinencharacter
>   2 |  This has a quote " character
>   3 | This has a fake tab \t character (typed backslash, t)
> (3 rows)
> {code}
> I tried replacing \n in the CSV file with \\n, which just expands to \n in 
> the table; and with an actual newline character, which fails with error since 
> it prematurely terminates the record.
> It seems backslashes are only used to take the following character as a 
> literal
> Until this is fixed, what would be the best way to refactor an old table with 
> a new, incompatible structure maintaining its content and name, since we 
> can't rename tables?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-8675) COPY TO/FROM broken for newline characters

2020-07-06 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152366#comment-17152366
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-8675:
--

I tried the patch, but still running into the issue where if I look at the data 
with cqlsh I see  a yellow '\n' after the import (literal) instead of  purple 
'\n'  (control character) 

> COPY TO/FROM broken for newline characters
> --
>
> Key: CASSANDRA-8675
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8675
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native 
> protocol v3]
> Ubuntu 14.04 64-bit
>Reporter: Lex Lythius
>Priority: Normal
>  Labels: cqlsh, remove-reopen
> Fix For: 3.0.x
>
> Attachments: CASSANDRA-8675.patch, copytest.csv
>
>
> Exporting/importing does not preserve contents when texts containing newline 
> (and possibly other) characters are involved:
> {code:sql}
> cqlsh:test> create table if not exists copytest (id int primary key, t text);
> cqlsh:test> insert into copytest (id, t) values (1, 'This has a newline
> ... character');
> cqlsh:test> insert into copytest (id, t) values (2, 'This has a quote " 
> character');
> cqlsh:test> insert into copytest (id, t) values (3, 'This has a fake tab \t 
> character (typed backslash, t)');
> cqlsh:test> select * from copytest;
>  id | t
> +-
>   1 |   This has a newline\ncharacter
>   2 |This has a quote " character
>   3 | This has a fake tab \t character (entered slash-t text)
> (3 rows)
> cqlsh:test> copy copytest to '/tmp/copytest.csv';
> 3 rows exported in 0.034 seconds.
> cqlsh:test> copy copytest from '/tmp/copytest.csv';
> 3 rows imported in 0.005 seconds.
> cqlsh:test> select * from copytest;
>  id | t
> +---
>   1 |  This has a newlinencharacter
>   2 |  This has a quote " character
>   3 | This has a fake tab \t character (typed backslash, t)
> (3 rows)
> {code}
> I tried replacing \n in the CSV file with \\n, which just expands to \n in 
> the table; and with an actual newline character, which fails with error since 
> it prematurely terminates the record.
> It seems backslashes are only used to take the following character as a 
> literal
> Until this is fixed, what would be the best way to refactor an old table with 
> a new, incompatible structure maintaining its content and name, since we 
> can't rename tables?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2020-06-25 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145722#comment-17145722
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874:
---

[~brandon.williams] Thanks for the information. Before I upgrade the cluster to 
3.11.6, I would like to understand if there are any known issues with 3.11.6?

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2020-06-16 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136793#comment-17136793
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-15874 at 6/16/20, 9:03 PM:
-

thanks [~brandon.williams] can you please provide the symptoms of this race 
conditions? in my case I see only some portion of the data is not bootstrapped 
but rest of the data bootstrapped without any issues. 


was (Author: jaid):
thanks [~brandon.williams] can you please provide the symptoms of this race 
conditions? in my case I see only some portion of the data is bootstrapped but 
rest of the data bootstrapped without any issues. 

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2020-06-16 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136793#comment-17136793
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874:
---

thanks [~brandon.williams] can you please provide the symptoms of this race 
conditions? in my case I see only some portion of the data is bootstrapped but 
rest of the data bootstrapped without any issues. 

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-14 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135298#comment-17135298
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15850:
---

any update on this? is there some configuration that can help to reduce the 
delay?

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2020-06-14 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135289#comment-17135289
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874:
---

Thanks [~brandon.williams], can you please provide some details under what 
scenario when this will happens? I am trying to re-produce the issue.

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2020-06-12 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-15874:
-

 Summary: Bootstrap completes Successfully without streaming all 
the data
 Key: CASSANDRA-15874
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Bootstrap and Decommission
Reporter: Jai Bheemsen Rao Dhanwada


I am seeing a strange issue where, adding a new node with auto_bootstrap: true 
is not streaming all the data before it joins the cluster. Don't see any 
information in the logs about bootstrap failures.

Here is the sequence of logs

 
{code:java}
INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: schema 
complete, ready to bootstrap
INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: waiting 
for pending range calculation
INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
calculation complete, ready to bootstrap
INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: getting 
bootstrap token
INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
Starting to bootstrap...
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
cfId . If a table was just created, this is likely due to the schema not 
being fully propagated. Please wait for schema agreement on table creation.
INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 StreamResultFuture.java:219 
- [Stream #f4224f444-a55d-154a-23e3-867899486f5f] All sessions completed INFO 
[StreamReceiveTask:1] 2020-06-12 02:29:51,892 StorageService.java:1505 - 
Bootstrap completed! for the tokens
{code}
Cassandra Version: 3.11.3

I am not able to reproduce this issue all the time, but it happened couple of 
times. Is there any  race condition/corner case, which could cause this issue?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-03 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15850:
--
Impacts:   (was: None)

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-03 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-15850:
-

 Summary: Delay between Gossip settle and CQL port opening during 
the startup
 Key: CASSANDRA-15850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada


Hello,

When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
gossip settle and CQL port opening. Can someone please explain me where this 
delay is configured and can this be changed? I don't see any information in the 
logs

In my case if you see there is  a ~3 minutes delay and this increases if I 
increase the #of tables and #of nodes and DC.


{code:java}
INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to 
settle...
INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop
INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
[netty-buffer=netty-buffer-4.0.44.Final.452812a, 
netty-codec=netty-codec-4.0.44.Final.452812a, 
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
netty-common=netty-common-4.0.44.Final.452812a, 
netty-handler=netty-handler-4.0.44.Final.452812a, 
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
netty-transport=netty-transport-4.0.44.Final.452812a, 
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, 
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
CQL clients on /x.x.x.x:9042 (encrypted)...
{code}


Also during this 3-10 minutes delay, I see 
{noformat}
nodetool compactionstats
{noformat}
 command is hung and never respond, until the CQL port is up and running.

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3


The issue can be easily reproducible with around 300 Tables and 100 nodes in a 
cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2020-04-20 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088090#comment-17088090
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15449:
---

Any pointers here?
Today, I saw an issue on a 3 node cluster where, I just started adding new 
nodes (bootstrap) and see the issue.

In this case RF:3 
Consistency for Read Queries: Local_QUORUM.

As I mentioned initially I don't see any exceptions or errors in the Cassandra 
logs.

> Credentials out of sync after replacing the nodes
> -
>
> Key: CASSANDRA-15449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png
>
>
> Hello,
> We are seeing a strange issue where, after replacing multiple C* nodes from 
> the clusters intermittently we see an issue where few nodes doesn't have any 
> credentials and the client queries fail.
> Here are the sequence of steps
> 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes 
> in one DC. 
> 2. The approach we took to replace the nodes is kill one node and launch a 
> new node with {{-Dcassandra.replace_address=}} and proceed with next node 
> once the node is bootstrapped and CQL is enabled.
>  3. This process works fine and all of a sudden, we started seeing our 
> application started failing with the below errors in the logs
> {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc 
> has no SELECT permission on  or any of its parents at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
>  at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
>  at
> {quote}
> 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
> rest of the nodes are serving ~100 requests. (attached the metrics)
>  5. We suspect some credentials sync issue and manually synced the 
> credentials and restarted the nodes with 0 requests, which fixed the problem.
> Also, one few C* nodes we see below exception immediately after the bootstrap 
> is completed and the process dies. is this contributing to the credentials 
> issue?
> NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
> are not the same.
> {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
> Exception encountered during startup
>  java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
> salted_hash in selection clause
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
> ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
>  [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  Caused by: org.apache.cassandra.exceptions.InvalidRequestException: 
> Undefined name salted_hash in selection clause
>  at 
> org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  ... 7 common frames omitted
> {quote}
> Not sure why this is happening, is this a potential bug or any other pointers 
> to fix the problem.
> C* Version: 2.1.16
>  Client: Datastax Java Driver.
>  system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15645) Can't send schema pull request: node /A.B.C.D is down

2020-04-20 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088063#comment-17088063
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15645:
---

I had similar issue and looks like this is introduced in 3.11
https://fossies.org/diffs/apache-cassandra/3.10-src_vs_3.11.0-src/src/java/org/apache/cassandra/service/MigrationTask.java-diff.html

Does this cause any issue to the schema or data?

I am using C* version 3.11.3.

> Can't send schema pull request: node /A.B.C.D is down
> -
>
> Key: CASSANDRA-15645
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15645
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Pierre Belanger apache.org
>Priority: Normal
>
> On a new cluster with Cassandra 3.11.5, each time a node joins the cluster 
> the schema pull request happens before at least 1 node is confirmed up.  On 
> the first node it's fine but node #2 and following are all complaining with 
> below WARN.
>  
> {noformat}
> INFO [MigrationStage:1] 2020-03-16 16:49:32,355 ColumnFamilyStore.java:426 - 
> Initializing system_auth.roles
> WARN [MigrationStage:1] 2020-03-16 16:49:32,368 MigrationTask.java:67 - Can't 
> send schema pull request: node /A.B.C.D is down.
> WARN [MigrationStage:1] 2020-03-16 16:49:32,369 MigrationTask.java:67 - Can't 
> send schema pull request: node /A.B.C.D is down.
> INFO [main] 2020-03-16 16:49:32,371 Gossiper.java:1780 - Waiting for gossip 
> to settle...
> INFO [GossipStage:1] 2020-03-16 16:49:32,493 Gossiper.java:1089 - InetAddress 
> /A.B.C.D is now UP
> INFO [HANDSHAKE-/10.205.45.19] 2020-03-16 16:49:32,545 
> OutboundTcpConnection.java:561 - Handshaking version with /A.B.C.D
> {noformat}
>  
> It's not urgent to fix but the WARN create noise for no reason.  Before 
> trying to pull the schema, shouldn't the process wait for gossip to have at 
> least 1 node "up"?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2020-02-12 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-13649:
--
Status: Open  (was: Resolved)

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging, Legacy/Testing
>Reporter: Stefan Podkowinski
>Assignee: Norman Maurer
>Priority: Normal
>  Labels: patch
> Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0
>
> Attachments: 
> 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, 
> test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2020-02-12 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035671#comment-17035671
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13649:
---

Hello,

I am using 3.11.3 version of Cassandra and I still see the exceptions are 
happening. Can some please take a look? also are these errors harmful? I don't 
see any errors on my application. just want to make sure I am not ignoring the 
potential issues.

Also looking at the exception it's not specific to Cassandra version, rather 
specific to netty version?
{code:java}
INFO  [epollEventLoopGroup-2-25] 2020-02-12 19:46:13,867 Message.java:623 - 
Unexpected exception during request; channel = [id: 0x4cea3872, 
L:/10.130.8.31:9042 - R:/10.131.85.41:47374]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
{code}


> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging, Legacy/Testing
>Reporter: Stefan Podkowinski
>Assignee: Norman Maurer
>Priority: Normal
>  Labels: patch
> Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0
>
> Attachments: 
> 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, 
> test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-17 Thread Jai Bheemsen Rao Dhanwada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998605#comment-16998605
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15449:
---

[~dcapwell] Thanks for the response. For Manual sync I ran select with 
CONSISTENCY ALL, so it does a read repair. I don't see any errors in the 
Cassandra system.log except the salted_hash exception I pasted above. 

> Credentials out of sync after replacing the nodes
> -
>
> Key: CASSANDRA-15449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png
>
>
> Hello,
> We are seeing a strange issue where, after replacing multiple C* nodes from 
> the clusters intermittently we see an issue where few nodes doesn't have any 
> credentials and the client queries fail.
> Here are the sequence of steps
> 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes 
> in one DC. 
> 2. The approach we took to replace the nodes is kill one node and launch a 
> new node with {{-Dcassandra.replace_address=}} and proceed with next node 
> once the node is bootstrapped and CQL is enabled.
>  3. This process works fine and all of a sudden, we started seeing our 
> application started failing with the below errors in the logs
> {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc 
> has no SELECT permission on  or any of its parents at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
>  at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
>  at
> {quote}
> 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
> rest of the nodes are serving ~100 requests. (attached the metrics)
>  5. We suspect some credentials sync issue and manually synced the 
> credentials and restarted the nodes with 0 requests, which fixed the problem.
> Also, one few C* nodes we see below exception immediately after the bootstrap 
> is completed and the process dies. is this contributing to the credentials 
> issue?
> NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
> are not the same.
> {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
> Exception encountered during startup
>  java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
> salted_hash in selection clause
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
> ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
>  [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  Caused by: org.apache.cassandra.exceptions.InvalidRequestException: 
> Undefined name salted_hash in selection clause
>  at 
> org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  ... 7 common frames omitted
> {quote}
> Not sure why this is happening, is this a potential bug or any other pointers 
> to fix the problem.
> C* Version: 2.1.16
>  Client: Datastax Java Driver.
>  system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Impacts: Clients  (was: None)

> Credentials out of sync after replacing the nodes
> -
>
> Key: CASSANDRA-15449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png
>
>
> Hello,
> We are seeing a strange issue where, after replacing multiple C* nodes from 
> the clusters intermittently we see an issue where few nodes doesn't have any 
> credentials and the client queries fail.
> Here are the sequence of steps
> 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes 
> in one DC. 
> 2. The approach we took to replace the nodes is kill one node and launch a 
> new node with {{-Dcassandra.replace_address=}} and proceed with next node 
> once the node is bootstrapped and CQL is enabled.
>  3. This process works fine and all of a sudden, we started seeing our 
> application started failing with the below errors in the logs
> {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc 
> has no SELECT permission on  or any of its parents at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
>  at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
>  at
> {quote}
> 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
> rest of the nodes are serving ~100 requests. (attached the metrics)
>  5. We suspect some credentials sync issue and manually synced the 
> credentials and restarted the nodes with 0 requests, which fixed the problem.
> Also, one few C* nodes we see below exception immediately after the bootstrap 
> is completed and the process dies. is this contributing to the credentials 
> issue?
> NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
> are not the same.
> {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
> Exception encountered during startup
>  java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
> salted_hash in selection clause
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
> ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
>  [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  Caused by: org.apache.cassandra.exceptions.InvalidRequestException: 
> Undefined name salted_hash in selection clause
>  at 
> org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  ... 7 common frames omitted
> {quote}
> Not sure why this is happening, is this a potential bug or any other pointers 
> to fix the problem.
> C* Version: 2.1.16
>  Client: Datastax Java Driver.
>  system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Description: 
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

 5. We suspect some credentials sync issue and manually synced the credentials 
and restarted the nodes with 0 requests, which fixed the problem.

Also, one few C* nodes we see below exception immediately after the bootstrap 
is completed and the process dies. is this contributing to the credentials 
issue?
NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
are not the same.

{quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
Exception encountered during startup
 java.lang.AssertionError: 
org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
salted_hash in selection clause
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.16.jar:2.1.16]
 Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined 
name salted_hash in selection clause
 at 
org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 ... 7 common frames omitted
{quote}
Not sure why this is happening, is this a potential bug or any other pointers 
to fix the problem.

C* Version: 2.1.16
 Client: Datastax Java Driver.
 system_auth RF: 3, dc-1:3 and dc-2:3

  was:
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

 !Screen Shot 2019-12-12 at 11.13.52 AM.png! 
 5. We suspect some credentials sync issue and man

[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Attachment: Screen Shot 2019-12-12 at 11.13.52 AM.png

> Credentials out of sync after replacing the nodes
> -
>
> Key: CASSANDRA-15449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png
>
>
> Hello,
> We are seeing a strange issue where, after replacing multiple C* nodes from 
> the clusters intermittently we see an issue where few nodes doesn't have any 
> credentials and the client queries fail.
> Here are the sequence of steps
> 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes 
> in one DC. 
> 2. The approach we took to replace the nodes is kill one node and launch a 
> new node with {{-Dcassandra.replace_address=}} and proceed with next node 
> once the node is bootstrapped and CQL is enabled.
>  3. This process works fine and all of a sudden, we started seeing our 
> application started failing with the below errors in the logs
> {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc 
> has no SELECT permission on  or any of its parents at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
>  at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
>  at
> {quote}
> 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
> rest of the nodes are serving ~100 requests. (attached the metrics)
>  !Screen Shot 2019-12-12 at 11.13.52 AM.png! 
>  5. We suspect some credentials sync issue and manually synced the 
> credentials and restarted the nodes with 0 requests, which fixed the problem.
> Also, one few C* nodes we see below exception immediately after the bootstrap 
> is completed and the process dies. is this contributing to the credentials 
> issue?
> NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
> are not the same.
> {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
> Exception encountered during startup
>  java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
> salted_hash in selection clause
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
> ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
>  [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  Caused by: org.apache.cassandra.exceptions.InvalidRequestException: 
> Undefined name salted_hash in selection clause
>  at 
> org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  ... 7 common frames omitted
> {quote}
> Not sure why this is happening, is this a potential bug or any other pointers 
> to fix the problem.
> C* Version: 2.1.16
>  Client: Datastax Java Driver.
>  system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Description: 
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

 !Screen Shot 2019-12-12 at 11.13.52 AM.png! 
 5. We suspect some credentials sync issue and manually synced the credentials 
and restarted the nodes with 0 requests, which fixed the problem.

Also, one few C* nodes we see below exception immediately after the bootstrap 
is completed and the process dies. is this contributing to the credentials 
issue?
NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
are not the same.

{quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
Exception encountered during startup
 java.lang.AssertionError: 
org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
salted_hash in selection clause
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.16.jar:2.1.16]
 Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined 
name salted_hash in selection clause
 at 
org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 ... 7 common frames omitted
{quote}
Not sure why this is happening, is this a potential bug or any other pointers 
to fix the problem.

C* Version: 2.1.16
 Client: Datastax Java Driver.
 system_auth RF: 3, dc-1:3 and dc-2:3

  was:
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

!Screen Shot 2019-12-12 at 11.13.52 AM.png!
 5. We

[jira] [Created] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)

Jai Bheemsen Rao Dhanwada created CASSANDRA-15449:
-

 Summary: Credentials out of sync after replacing the nodes
 Key: CASSANDRA-15449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada
 Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png

Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

!Screen Shot 2019-12-12 at 11.13.52 AM.png!
 5. We suspect some credentials sync issue and manually synced the credentials 
and restarted the nodes with 0 requests, which fixed the problem.

Also, one few C* nodes we see below exception immediately after the bootstrap 
is completed and the process dies. is this contributing to the credentials 
issue?
NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
are not the same.

{quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
Exception encountered during startup
 java.lang.AssertionError: 
org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
salted_hash in selection clause
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.16.jar:2.1.16]
 Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined 
name salted_hash in selection clause
 at 
org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 ... 7 common frames omitted
{quote}
Not sure why this is happening, is this a potential bug or any other pointers 
to fix the problem.

C* Version: 2.1.16
 Client: Datastax Java Driver.
 system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-03-01 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781997#comment-16781997
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038:
---

[~slebresne] Thank you, yes I agree with the security concerns, we can add 
warnings and enable this, so that the truststore check can be disabled. It 
would be great if this can be implemented.

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. 
>  
> though the require_client_auth: is set to false, cassandra fails to startup 
> if we disable truststore and truststore_password as it look for default 
> truststore under `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-28 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780708#comment-16780708
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038:
---

correct, ignore the truststore check

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. 
>  
> though the require_client_auth: is set to false, cassandra fails to startup 
> if we disable truststore and truststore_password as it look for default 
> truststore under `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780223#comment-16780223
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038:
---

In a way yes, but consider another use-case where I trying to setup SSL to 
encrypt the messages in flight but I trust the members who try to join the 
cluster. Agree, there are several ways to do it, but ask was why not make use 
of cassandra configuration to do it when it's already present. (in this case 
it's not working as  expected)

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. 
>  
> though the require_client_auth: is set to false, cassandra fails to startup 
> if we disable truststore and truststore_password as it look for default 
> truststore under `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780218#comment-16780218
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-15038 at 2/28/19 7:51 AM:


Yes, the basic idea if every cassandra node will have a self signed cert, since 
the CA is different for each node they don't join the cluster as the other 
members don't know about the CA. So, there should be a way to disable a CA 
check. The current require_client_auth: false doesn't seem to be working.

I tried to uncomment the property and set to false, even that didn't make much 
difference.


was (Author: jaid):
Yes, the basic idea if every cassandra node will have a self signed cert, since 
the CA is different for each node they don't join the cluster as the other 
members don't know about the CA. So, there should be a way to disable a CA 
check. The current require_client_auth: false doesn't seem to be working.

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. 
>  
> though the require_client_auth: is set to false, cassandra fails to startup 
> if we disable truststore and truststore_password as it look for default 
> truststore under `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780218#comment-16780218
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038:
---

Yes, the basic idea if every cassandra node will have a self signed cert, since 
the CA is different for each node they don't join the cluster as the other 
members don't know about the CA. So, there should be a way to disable a CA 
check. The current require_client_auth: false doesn't seem to be working.

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. 
>  
> though the require_client_auth: is set to false, cassandra fails to startup 
> if we disable truststore and truststore_password as it look for default 
> truststore under `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15038:
--
Description: 
Hello,

The current internode encryption between cassandra nodes uses a keystore and 
truststore. However there are some use-case where users are okay to allow any 
one to trust as long as they have a keystore. This is requirement is only for 
encryption but not trusting the identity.

It would be good to have an option to disable the Truststore CA check for the 
internode_encryption.

 

In the current cassandra.yaml, there is no way to comment/disable the 
truststore and truststore password and allow anyone to connect with a 
certificate. `conf/.truststore`

 
{code:java}
server_encryption_options:
 internode_encryption: all
 keystore: /etc/cassandra/keystore.jks
 keystore_password: mykeypass
 truststore: /etc/cassandra/truststore.jks
 truststore_password: truststorepass
 # More advanced defaults below:
 # protocol: TLS
 # algorithm: SunX509
 # store_type: JKS
 # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
 # require_client_auth: false
 # require_endpoint_verification: false{code}
{noformat}
Caused by: java.io.IOException: Error creating the initializing the SSL Context
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 8 common frames omitted
Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
 at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
 at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 10 common frames omitted{noformat}
 

 Cassandra Version: 3.11.3

 

  was:
Hello,

The current internode encryption between cassandra nodes uses a keystore and 
truststore. However there are some use-case where users are okay to allow any 
one to trust as long as they have a keystore. This is requirement is only for 
encryption but not trusting the identity.

It would be good to have an option to disable the Truststore CA check for the 
internode_encryption.

 

In the current cassandra.yaml, there is no way to comment/disable the 
truststore and truststore password and allow anyone to connect with a 
certificate. `conf/.truststore`

 
{code:java}
server_encryption_options:
 internode_encryption: all
 keystore: /etc/cassandra/keystore.jks
 keystore_password: mykeypass
 truststore: /etc/cassandra/truststore.jks
 truststore_password: truststorepass
 # More advanced defaults below:
 # protocol: TLS
 # algorithm: SunX509
 # store_type: JKS
 # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
 # require_client_auth: false
 # require_endpoint_verification: false{code}



{noformat}
Caused by: java.io.IOException: Error creating the initializing the SSL Context
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 8 common frames omitted
Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
 at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
 at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 10 common frames omitted{noformat}
 

 

 


> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type:

[jira] [Updated] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15038:
--
Description: 
Hello,

The current internode encryption between cassandra nodes uses a keystore and 
truststore. However there are some use-case where users are okay to allow any 
one to trust as long as they have a keystore. This is requirement is only for 
encryption but not trusting the identity.

It would be good to have an option to disable the Truststore CA check for the 
internode_encryption.

 

In the current cassandra.yaml, there is no way to comment/disable the 
truststore and truststore password and allow anyone to connect with a 
certificate. 

 

though the require_client_auth: is set to false, cassandra fails to startup if 
we disable truststore and truststore_password as it look for default truststore 
under `conf/.truststore`

 
{code:java}
server_encryption_options:
 internode_encryption: all
 keystore: /etc/cassandra/keystore.jks
 keystore_password: mykeypass
 truststore: /etc/cassandra/truststore.jks
 truststore_password: truststorepass
 # More advanced defaults below:
 # protocol: TLS
 # algorithm: SunX509
 # store_type: JKS
 # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
 # require_client_auth: false
 # require_endpoint_verification: false{code}
{noformat}
Caused by: java.io.IOException: Error creating the initializing the SSL Context
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 8 common frames omitted
Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
 at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
 at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 10 common frames omitted{noformat}
 

 Cassandra Version: 3.11.3

 

  was:
Hello,

The current internode encryption between cassandra nodes uses a keystore and 
truststore. However there are some use-case where users are okay to allow any 
one to trust as long as they have a keystore. This is requirement is only for 
encryption but not trusting the identity.

It would be good to have an option to disable the Truststore CA check for the 
internode_encryption.

 

In the current cassandra.yaml, there is no way to comment/disable the 
truststore and truststore password and allow anyone to connect with a 
certificate. `conf/.truststore`

 
{code:java}
server_encryption_options:
 internode_encryption: all
 keystore: /etc/cassandra/keystore.jks
 keystore_password: mykeypass
 truststore: /etc/cassandra/truststore.jks
 truststore_password: truststorepass
 # More advanced defaults below:
 # protocol: TLS
 # algorithm: SunX509
 # store_type: JKS
 # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
 # require_client_auth: false
 # require_endpoint_verification: false{code}
{noformat}
Caused by: java.io.IOException: Error creating the initializing the SSL Context
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 8 common frames omitted
Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
 at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
 at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 10 common frames omitted{noformat}
 

 Cassandra Version: 3.11.3

 


> Provide an option to Disable Truststore CA check for internode_encryption
> -

[jira] [Updated] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15038:
--
Issue Type: Bug  (was: Improvement)

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-27 Thread Jai Bheemsen Rao Dhanwada (JIRA)

Jai Bheemsen Rao Dhanwada created CASSANDRA-15038:
-

 Summary: Provide an option to Disable Truststore CA check for 
internode_encryption
 Key: CASSANDRA-15038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
 Project: Cassandra
  Issue Type: Improvement
  Components: Feature/Encryption
Reporter: Jai Bheemsen Rao Dhanwada


Hello,

The current internode encryption between cassandra nodes uses a keystore and 
truststore. However there are some use-case where users are okay to allow any 
one to trust as long as they have a keystore. This is requirement is only for 
encryption but not trusting the identity.

It would be good to have an option to disable the Truststore CA check for the 
internode_encryption.

 

In the current cassandra.yaml, there is no way to comment/disable the 
truststore and truststore password and allow anyone to connect with a 
certificate. `conf/.truststore`

 
{code:java}
server_encryption_options:
 internode_encryption: all
 keystore: /etc/cassandra/keystore.jks
 keystore_password: mykeypass
 truststore: /etc/cassandra/truststore.jks
 truststore_password: truststorepass
 # More advanced defaults below:
 # protocol: TLS
 # algorithm: SunX509
 # store_type: JKS
 # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
 # require_client_auth: false
 # require_endpoint_verification: false{code}



{noformat}
Caused by: java.io.IOException: Error creating the initializing the SSL Context
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 8 common frames omitted
Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
 at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
 at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
 at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
 at 
org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 10 common frames omitted{noformat}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process

2018-10-26 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665755#comment-16665755
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-11748:
---

Any proposed fix for this available yet? or any work around? 

> Schema version mismatch may leads to Casandra OOM at bootstrap during a 
> rolling upgrade process
> ---
>
> Key: CASSANDRA-11748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11748
> Project: Cassandra
>  Issue Type: Bug
> Environment: Rolling upgrade process from 1.2.19 to 2.0.17. 
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>Reporter: Michael Fong
>Assignee: Matt Byrd
>Priority: Critical
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ran 
> into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. 
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema version 
> agreemnt - via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has different 
> schema version.
> 4. All nodes in cluster start to rapidly exchange schema information, and any 
> of node could run into OOM. 
> The following is the system.log that occur in one of our 2-node cluster test 
> bed
> --
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> After rebooting node 2, 
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) 
> Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the other 
> node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node 
> /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> ... ( over 100+ times)
> --
> On the otherhand, Node 1 keeps updating its gossip information, followed by 
> receiving and submitting migrationTask afterwards: 
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 
> 978) InetAddress /192.168.88.34 is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 
> MigrationRequestVerbHandler.java (line 41) Received migration request from 
> /192.168.88.34.
> …… ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 
> 127) submitting migration task for /192.168.88.34
> .  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra 
> database, which may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have 
> InternalResponseStage performing schema merge operation. Since this operation 
> requires a compaction for each merge and is much slower to consume. Thus, the 
> back-pressure of incoming schema migration content objects consumes all of 
> the heap space and ultimately ends up OOM!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Reopened] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-26 Thread Jai Bheemsen Rao Dhanwada (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada reopened CASSANDRA-14840:
---

[~jjirsa] is this fixed any of the new versions of Cassandra? or are there any 
workarounds to overcome the issue? given that
 # I am already using off heap buffers
 # Adding iptables for every node addition in a production environment is not 
an option for me.

 

> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-24 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663076#comment-16663076
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14840:
---

[~jjirsa] This is the production cluster and all the CF are being used, so I 
can't delete any of the CF.

 
 # I am already using Offheap memtables, still the getting OOM. Current Heap 
settings are `-Xms8192M -Xmx8192M -Xmn1200M` and CMS Heap. I tried increasing 
the heap size to 16G and after adding 120 nodes I still OOM issues to the new 
node bootstrapping. any other suggestions here?
 # Sounds like very handful approach, not sure if I can time it very well.

> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-23 Thread Jai Bheemsen Rao Dhanwada (JIRA)

Jai Bheemsen Rao Dhanwada created CASSANDRA-14840:
-

 Summary: Bootstrap of new node fails with OOM in a large cluster
 Key: CASSANDRA-14840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
Reporter: Jai Bheemsen Rao Dhanwada


We are seeing new node addition fails with OOM during bootstrap in a cluster of 
more than 80 nodes and 3000 CF without any data in those CFs.

 

Steps to reproduce:
 # Launch a 3 node cluster
 # Create 3000 CF in the cluster
 # Start adding nodes to the cluster one by one
 # After adding 75-80 nodes, the new node bootstrap fails with OOM.

{code:java}
ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
log during initialization.
java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
 at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
 at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
 at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
 at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
 at 
org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
Cassandra Version: 2.1.16

OS: CentOS7

num_tokens: 256 on each node.

 

This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13538) Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max "

2018-06-04 Thread Jai Bheemsen Rao Dhanwada (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-13538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501322#comment-16501322
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13538:
---

Noticed the similar issue in one of the environments, has anyone have any 
workaround?

> Cassandra tasks permanently block after the following assertion occurs during 
> compaction: "java.lang.AssertionError: Interval min > max "
> -
>
> Key: CASSANDRA-13538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13538
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: This happens on a 7 node system with 2 data centers. 
> We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still 
> occurs.
>Reporter: Andy Klages
>Priority: Major
> Fix For: 2.1.x
>
> Attachments: cassandra.yaml, jstack.out, schema.cql3, system.log, 
> tpstats.out
>
>
> We noticed this problem because the commitlogs proliferate to the point that 
> we eventually run out of disk space. nodetool tpstats shows several of the 
> tasks backed up:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0  134335315 0
>  0
> ReadStage 0 0  643986790 0
>  0
> RequestResponseStage  0 0 114298 0
>  0
> ReadRepairStage   0 0 36 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> AntiEntropySessions   1 1  79357 0
>  0
> HintedHandoff 0 0 90 0
>  0
> GossipStage   0 06595098 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 01638369 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   1752922542 0
>  0
> ValidationExecutor0 01465374 0
>  0
> MigrationStage176600 0
>  0
> AntiEntropyStage  1   9238291098 0
>  0
> PendingRangeCalculator0 0 20 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0  53017 0
>  0
> MemtablePostFlush 1  45841545141 0
>  0
> MemtableReclaimMemory 0 0  70639 0
>  0
> Native-Transport-Requests 0 0 352559 0
>  0
> {code}
> This all starts after the following exception is raised in Cassandra:
> {code}
> ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 
> CassandraDaemon.java:231 - Exception in thread 
> Thread[MemtableFlushWriter:2437,5,main]
> java.lang.AssertionError: Interval min > max
>   at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:249)
>  ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at org.apache.cassandra.utils.IntervalTree.(IntervalTree.java:72) 
> ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:603)
>  ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:597)
>  ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) 
> ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) 
> ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) 
> ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>  ~[apache-cassandra-2.1.15.jar:2.1.15]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.r

[jira] [Commented] (CASSANDRA-13235) All thread blocked and writes pending.

2017-08-18 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133721#comment-16133721
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13235:
---

We are also seeing similar issue and we see the huge write latency around 10s
{code:java}
"SharedPool-Worker-11" #355 daemon prio=5 os_prio=0 tid=0x7f9458e33800 
nid=0x9e50 waiting for monitor entry [0x7f943cbf2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at sun.misc.Unsafe.monitorEnter(Native Method)
at 
org.apache.cassandra.utils.concurrent.Locks.monitorEnterUnsafe(Locks.java:46)
at 
org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:202)
at org.apache.cassandra.db.Memtable.put(Memtable.java:210)
at 
org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1263)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359)
at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
at 
org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}

> All thread blocked and writes pending.
> --
>
> Key: CASSANDRA-13235
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13235
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: jdk8
> cassandra 2.1.15
>Reporter: zhaoyan
>
> I found cassandra many pending  MutationStage task
> {code}
> NFO  [Service Thread] 2017-02-17 16:00:14,440 StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [Service Thread] 2017-02-17 16:00:14,440 StatusLogger.java:66 - 
> MutationStage   384  4553 4294213082 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> RequestResponseStage  0 0 2172612382 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> ReadRepairStage   0 05378852 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> CounterMutationStage  0 0  0 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> ReadStage 5 0  577242284 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> MiscStage 0 0  0 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> HintedHandoff 0 0   1480 0
>  0
> INFO  [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - 
> GossipStage   0 09342250 0
>  0
> {code}
> And I found there are many blocked thread with jstack
> {code}
> "SharedPool-Worker-28" #416 daemon prio=5 os_prio=0 tid=0x01fb8000 
> nid=0x7459 waiting for monitor entry [0x7fdd83ca]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at sun.misc.Unsafe.monitorEnter(Native Method)
> at 
> org.apache.cassandra.utils.concurrent.Locks.monitorEnterUnsafe(Locks.java:46)
> at 
> org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:202)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:210)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1244)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359)
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
> at 
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.ja

[jira] [Commented] (CASSANDRA-13526) nodetool cleanup on KS with no replicas should remove old data, not silently complete

2017-05-11 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007319#comment-16007319
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13526:
---

The issue I am seeing on C* cluster with the below setup

Cassandra version : 2.1.16
Datacenters: 4 DC
RF: NetworkTopologyStrategy with 3 RF in each DC
Keyspaces: 50 keyspaces, few replicating to one DC and few replicating to 
multiple DC



> nodetool cleanup on KS with no replicas should remove old data, not silently 
> complete
> -
>
> Key: CASSANDRA-13526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13526
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jeff Jirsa
>
> From the user list:
> https://lists.apache.org/thread.html/5d49cc6bbc6fd2e5f8b12f2308a3e24212a55afbb441af5cb8cd4167@%3Cuser.cassandra.apache.org%3E
> If you have a multi-dc cluster, but some keyspaces not replicated to a given 
> DC, you'll be unable to run cleanup on those keyspaces in that DC, because 
> [the cleanup code will see no ranges and exit 
> early|https://github.com/apache/cassandra/blob/4cfaf85/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L427-L441]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Reopened] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-11-17 Thread Jai Bheemsen Rao Dhanwada (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada reopened CASSANDRA-12816:
---

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-11-11 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658407#comment-15658407
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816:
---

agree, but this limits someone to have all the keyspaces expanded to all the 
regions.
if I have a use-case of having a keyspaces belonging different regions, I can't 
make use of it.

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-11-11 Thread Jai Bheemsen Rao Dhanwada (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada reopened CASSANDRA-12816:
---
Reproduced In: 2.1.x
Since Version: 2.1.16

[~jjordan] I have encountered this issue again and now it is complaining about 
the non-system keyspace.
{code:java}
[jaibheemsen@node01 ~]$ nodetool rebuild us-east
nodetool: Unable to find sufficient sources for streaming range 
(1773952483933901933,1774688434180951054] in keyspace user_prod
See 'nodetool help' or 'nodetool help '.
[jaibheemsen@node01 ~]$
{code}

C* version : 2.1.16
user_prod: keyspace is present in us-west-2 but not in us-east. I am doing a 
nodetool rebuild in us-west-2 to stream some data from the us-east

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-10-25 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605806#comment-15605806
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816:
---

[~jjordan] Agree, the work around is to add 
-Dcassandra.consistent.rangemovement=false, but can you please help me 
understand, is this a bug or expected behavior?
if this is the expected behavior, what is the action needed when expanding the 
cluster to new DC? do I need to always use NetworkTopologyStrategy for all 
non-LocalStrategy keyspaces like (system_distributed, systems_traces)?

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-10-24 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603565#comment-15603565
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816:
---

This isn't correct. For example I have 10 Keyspaces in a cluster and I want to 
have 5 Keyspaces in DC-1 and other 5 keyspaces in DC-1 and DC-2, in this is the 
rebuild fails and I can't replicate the data to the newly added DC.

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-10-24 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603306#comment-15603306
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816:
---

system_traces and system_distributed are using SimpleStrategy, If I change it 
to the NetworkTopologyStrategy, the rebuild operation works. any idea what are 
the implications of changing them to NetworkTopologyStrategy?

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-10-24 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603265#comment-15603265
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816:
---

[~jeromatron] correct, but I am concerned about altering system level keyspaces 
(system, systems_traces, system_distributed). 
Do you know what is the impact (if any) of making system keyspaces to 
NetworkTopologyStrategy ?

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-10-21 Thread Jai Bheemsen Rao Dhanwada (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-12816:
--
Priority: Critical  (was: Major)

> Rebuild failing while adding new datacenter
> ---
>
> Key: CASSANDRA-12816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> Hello All,
> I have single datacenter with 3 C* nodes and we are trying to expand the 
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool rebuild -- name_of_existing_data_center" .  
> {code:java}
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range 
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
> {code}
> {code:java}
> user@cqlsh> SELECT * from system_schema.keyspaces where 
> keyspace_name='system_distributed';
>  keyspace_name | durable_writes | replication
> ---++-
>  system_distributed |   True | {'class': 
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
> (1 rows)
> {code}
> To overcome this I have updated system_distributed keyspace to DC1:3 and 
> DC2:3 with NetworkTopologyStrategy
> C* Version - 3.0.8
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
> seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-12816) Rebuild failing while adding new datacenter

2016-10-20 Thread Jai Bheemsen Rao Dhanwada (JIRA)

Jai Bheemsen Rao Dhanwada created CASSANDRA-12816:
-

 Summary: Rebuild failing while adding new datacenter
 Key: CASSANDRA-12816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12816
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada


Hello All,

I have single datacenter with 3 C* nodes and we are trying to expand the 
cluster to another region/DC. I am seeing the below error while doing a 
"nodetool rebuild -- name_of_existing_data_center" .  

{code:java}
[user@machine ~]$ nodetool rebuild DC1
nodetool: Unable to find sufficient sources for streaming range 
(-402178150752044282,-396707578307430827] in keyspace system_distributed
See 'nodetool help' or 'nodetool help '.
[user@machine ~]$
{code}

{code:java}
user@cqlsh> SELECT * from system_schema.keyspaces where 
keyspace_name='system_distributed';

 keyspace_name | durable_writes | replication
---++-
 system_distributed |   True | {'class': 
'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}

(1 rows)
{code}

To overcome this I have updated system_distributed keyspace to DC1:3 and DC2:3 
with NetworkTopologyStrategy

C* Version - 3.0.8

Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't 
seen this issue with the older versions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

85 matches

Mail list logo