[jira] [Commented] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846027#comment-17846027 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-19633: --- This looks to be due to the optimization done in [CASSANDRA-4650|https://issues.apache.org/jira/browse/CASSANDRA-4650] > Replaced node is stuck in a loop calculating ranges > --- > > Key: CASSANDRA-19633 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19633 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Labels: Bootstrap > Attachments: result1.html > > > Hello, > > I am running into an issue where in a node that is replacing a dead > (non-seed) node is stuck in calculating ranges forever. It eventually > succeeds, however the time taken for calculating the ranges is not constant. > I do sometimes see that it takes 24 hours to calculate ranges for each > keyspace. Attached the flume graph of the cassandra process during this time, > which points to the below code. > {code:java} > Multimap> > getRangeFetchMapForNonTrivialRanges() > { > //Get the graph with edges between ranges and their source endpoints > MutableCapacityGraph graph = getGraph(); > //Add source and destination vertex and edges > addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); > int flow = 0; > MaximumFlowAlgorithmResult> result = > null; > //We might not be working on all ranges > while (flow < getTotalRangeVertices(graph)) > { > if (flow > 0) > { //We could not find a path with previous graph. Bump the capacity b/w > endpoint vertices and destination by 1 incrementCapacity(graph, 1); } > MaximumFlowAlgorithm fordFulkerson = > FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); > result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, > IntegerNumberSystem.getInstance()); > int newFlow = result.calcTotalFlow(); > assert newFlow > flow; //We are not making progress which should not happen > flow = newFlow; > } > return getRangeFetchMapFromGraphResult(graph, result); > } > {code} > Digging through the logs, I see the below log line for a given keyspace > `system_auth` > {code:java} > INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range > Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on > Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for > keyspace system_auth{code} > corresponding code: > {code:java} > for (Map.Entry entry : fetchMap.flattenEntries()) > logger.info("{}: range {} exists on {} for keyspace {}", description, > entry.getKey(), entry.getValue(), keyspaceName);{code} > BUT do not see the below line for the corresponding keyspace > {code:java} > RangeStreamer.java:606 - Output from RangeFetchMapCalculator for > keyspace{code} > this means the code it's stuck in `getRangeFetchMap();` > {code:java} > Multimap> rangeFetchMapMap = > calculator.getRangeFetchMap(); > logger.info("Output from RangeFetchMapCalculator for keyspace {}", > keyspace);{code} > Here is the cluster topology: > * Cassandra version: 4.0.12 > * # of nodes: 190 > * Tokens (vnodes): 128 > Initial hypothesis was that the graph calculation was taking longer due to > the combination of nodes + tokens + tables but in the same cluster I see one > of the node joined without any issues. > wondering if I am hitting a bug causing it to work sometimes but get into an > infinite loop some times? > Please let me know if you need any other details and appreciate any pointers > to debug this further. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633: -- Bug Category: Parent values: Degradation(12984) Discovered By: User Report Since Version: 4.0 > Replaced node is stuck in a loop calculating ranges > --- > > Key: CASSANDRA-19633 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19633 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Labels: Bootstrap > Attachments: result1.html > > > Hello, > > I am running into an issue where in a node that is replacing a dead > (non-seed) node is stuck in calculating ranges forever. It eventually > succeeds, however the time taken for calculating the ranges is not constant. > I do sometimes see that it takes 24 hours to calculate ranges for each > keyspace. Attached the flume graph of the cassandra process during this time, > which points to the below code. > {code:java} > Multimap> > getRangeFetchMapForNonTrivialRanges() > { > //Get the graph with edges between ranges and their source endpoints > MutableCapacityGraph graph = getGraph(); > //Add source and destination vertex and edges > addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); > int flow = 0; > MaximumFlowAlgorithmResult> result = > null; > //We might not be working on all ranges > while (flow < getTotalRangeVertices(graph)) > { > if (flow > 0) > { //We could not find a path with previous graph. Bump the capacity b/w > endpoint vertices and destination by 1 incrementCapacity(graph, 1); } > MaximumFlowAlgorithm fordFulkerson = > FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); > result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, > IntegerNumberSystem.getInstance()); > int newFlow = result.calcTotalFlow(); > assert newFlow > flow; //We are not making progress which should not happen > flow = newFlow; > } > return getRangeFetchMapFromGraphResult(graph, result); > } > {code} > Digging through the logs, I see the below log line for a given keyspace > `system_auth` > {code:java} > INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range > Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on > Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for > keyspace system_auth{code} > corresponding code: > {code:java} > for (Map.Entry entry : fetchMap.flattenEntries()) > logger.info("{}: range {} exists on {} for keyspace {}", description, > entry.getKey(), entry.getValue(), keyspaceName);{code} > BUT do not see the below line for the corresponding keyspace > {code:java} > RangeStreamer.java:606 - Output from RangeFetchMapCalculator for > keyspace{code} > this means the code it's stuck in `getRangeFetchMap();` > {code:java} > Multimap> rangeFetchMapMap = > calculator.getRangeFetchMap(); > logger.info("Output from RangeFetchMapCalculator for keyspace {}", > keyspace);{code} > Here is the cluster topology: > * Cassandra version: 4.0.12 > * # of nodes: 190 > * Tokens (vnodes): 128 > Initial hypothesis was that the graph calculation was taking longer due to > the combination of nodes + tokens + tables but in the same cluster I see one > of the node joined without any issues. > wondering if I am hitting a bug causing it to work sometimes but get into an > infinite loop some times? > Please let me know if you need any other details and appreciate any pointers > to debug this further. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633: -- Labels: Bootstrap (was: ) > Replaced node is stuck in a loop calculating ranges > --- > > Key: CASSANDRA-19633 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19633 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Labels: Bootstrap > Attachments: result1.html > > > Hello, > > I am running into an issue where in a node that is replacing a dead > (non-seed) node is stuck in calculating ranges forever. It eventually > succeeds, however the time taken for calculating the ranges is not constant. > I do sometimes see that it takes 24 hours to calculate ranges for each > keyspace. Attached the flume graph of the cassandra process during this time, > which points to the below code. > {code:java} > Multimap> > getRangeFetchMapForNonTrivialRanges() > { > //Get the graph with edges between ranges and their source endpoints > MutableCapacityGraph graph = getGraph(); > //Add source and destination vertex and edges > addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); > int flow = 0; > MaximumFlowAlgorithmResult> result = > null; > //We might not be working on all ranges > while (flow < getTotalRangeVertices(graph)) > { > if (flow > 0) > { //We could not find a path with previous graph. Bump the capacity b/w > endpoint vertices and destination by 1 incrementCapacity(graph, 1); } > MaximumFlowAlgorithm fordFulkerson = > FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); > result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, > IntegerNumberSystem.getInstance()); > int newFlow = result.calcTotalFlow(); > assert newFlow > flow; //We are not making progress which should not happen > flow = newFlow; > } > return getRangeFetchMapFromGraphResult(graph, result); > } > {code} > Digging through the logs, I see the below log line for a given keyspace > `system_auth` > {code:java} > INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range > Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on > Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for > keyspace system_auth{code} > corresponding code: > {code:java} > for (Map.Entry entry : fetchMap.flattenEntries()) > logger.info("{}: range {} exists on {} for keyspace {}", description, > entry.getKey(), entry.getValue(), keyspaceName);{code} > BUT do not see the below line for the corresponding keyspace > {code:java} > RangeStreamer.java:606 - Output from RangeFetchMapCalculator for > keyspace{code} > this means the code it's stuck in `getRangeFetchMap();` > {code:java} > Multimap> rangeFetchMapMap = > calculator.getRangeFetchMap(); > logger.info("Output from RangeFetchMapCalculator for keyspace {}", > keyspace);{code} > Here is the cluster topology: > * Cassandra version: 4.0.12 > * # of nodes: 190 > * Tokens (vnodes): 128 > Initial hypothesis was that the graph calculation was taking longer due to > the combination of nodes + tokens + tables but in the same cluster I see one > of the node joined without any issues. > wondering if I am hitting a bug causing it to work sometimes but get into an > infinite loop some times? > Please let me know if you need any other details and appreciate any pointers > to debug this further. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633: -- Description: Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. {code:java} Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } {code} Digging through the logs, I see the below log line for a given keyspace `system_auth` {code:java} INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth{code} corresponding code: {code:java} for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for keyspace {}", description, entry.getKey(), entry.getValue(), keyspaceName);{code} BUT do not see the below line for the corresponding keyspace {code:java} RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace{code} this means the code it's stuck in `getRangeFetchMap();` {code:java} Multimap> rangeFetchMapMap = calculator.getRangeFetchMap(); logger.info("Output from RangeFetchMapCalculator for keyspace {}", keyspace);{code} Here is the cluster topology: * Cassandra version: 4.0.12 * # of nodes: 190 * Tokens (vnodes): 128 Initial hypothesis was that the graph calculation was taking longer due to the combination of nodes + tokens + tables but in the same cluster I see one of the node joined without any issues. wondering if I am hitting a bug causing it to work sometimes but get into an infinite loop some times? Please let me know if you need any other details and appreciate any pointers to debug this further. was: Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. {code:java} Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } {code} Digging through the logs, I see the below log line for a given keyspace `system_auth` {code:java} INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth{code} corresponding code: {code:java} for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for keyspace {}
[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633: -- Description: Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. {code:java} Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } {code} Digging through the logs, I see the below log line for a given keyspace `system_auth` {code:java} INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth{code} corresponding code: {code:java} for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for keyspace {}", description, entry.getKey(), entry.getValue(), keyspaceName);{code} BUT do not see the below line for the corresponding keyspace {code:java} RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace{code} this means the code it's stuck in `getRangeFetchMap();` {code:java} Multimap> rangeFetchMapMap = calculator.getRangeFetchMap(); logger.info("Output from RangeFetchMapCalculator for keyspace {}", keyspace);{code} Here is the cluster topology: * Cassandra version: 4.0.12 * # of nodes: 190 * Tokens (vnodes): 128 Initial hypothesis was that the graph calculation was taking longer due to the combination of nodes + tokens + tables but in the same cluster I see one of the node joined without any issues. wondering if I am hitting a bug causing it to work sometimes but get into an infinite loop some times? Please let me know if you need any other details and appreciate any pointers to debug this further. was: Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. {code:java} Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } {code} Digging through the logs, I see the below log line for a given keyspace `system_auth` {code:java} INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth{code} corresponding code: {code:java} for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for ke
[jira] [Updated] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-19633: -- Description: Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. {code:java} Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } {code} Digging through the logs, I see the below log line for a given keyspace `system_auth` {code:java} INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth{code} corresponding code: {code:java} for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for keyspace {}", description, entry.getKey(), entry.getValue(), keyspaceName);{code} BUT do not see the below line for the corresponding keyspace {code:java} RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace{code} this means the code it's stuck in `getRangeFetchMap();` {code:java} Multimap> rangeFetchMapMap = calculator.getRangeFetchMap(); logger.info("Output from RangeFetchMapCalculator for keyspace {}", keyspace);{code} Here is the cluster topology: * Cassandra version: 4.0.12 * # of nodes: 190 * Tokens (vnodes): 128 Initial hypothesis was that the graph calculation was taking longer due to the combination of nodes + tokens + tables but in the same cluster I see one of the node joined without any issues. wondering if I am hitting a bug causing it to work sometimes but get into an infinite loop some times? Please let me know if you need any other details and appreciate any pointers to debug this further. was: Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. ``` Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } ``` Digging through the logs, I see the below log line for a given keyspace `system_auth` ``` INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth ``` corresponding code: ``` for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for keyspace {}",
[jira] [Created] (CASSANDRA-19633) Replaced node is stuck in a loop calculating ranges
Jai Bheemsen Rao Dhanwada created CASSANDRA-19633: - Summary: Replaced node is stuck in a loop calculating ranges Key: CASSANDRA-19633 URL: https://issues.apache.org/jira/browse/CASSANDRA-19633 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Attachments: result1.html Hello, I am running into an issue where in a node that is replacing a dead (non-seed) node is stuck in calculating ranges forever. It eventually succeeds, however the time taken for calculating the ranges is not constant. I do sometimes see that it takes 24 hours to calculate ranges for each keyspace. Attached the flume graph of the cassandra process during this time, which points to the below code. ``` Multimap> getRangeFetchMapForNonTrivialRanges() { //Get the graph with edges between ranges and their source endpoints MutableCapacityGraph graph = getGraph(); //Add source and destination vertex and edges addSourceAndDestination(graph, getDestinationLinkCapacity(graph)); int flow = 0; MaximumFlowAlgorithmResult> result = null; //We might not be working on all ranges while (flow < getTotalRangeVertices(graph)) { if (flow > 0) { //We could not find a path with previous graph. Bump the capacity b/w endpoint vertices and destination by 1 incrementCapacity(graph, 1); } MaximumFlowAlgorithm fordFulkerson = FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance()); result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, IntegerNumberSystem.getInstance()); int newFlow = result.calcTotalFlow(); assert newFlow > flow; //We are not making progress which should not happen flow = newFlow; } return getRangeFetchMapFromGraphResult(graph, result); } ``` Digging through the logs, I see the below log line for a given keyspace `system_auth` ``` INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for keyspace system_auth ``` corresponding code: ``` for (Map.Entry entry : fetchMap.flattenEntries()) logger.info("{}: range {} exists on {} for keyspace {}", description, entry.getKey(), entry.getValue(), keyspaceName); ``` BUT do not see the below line for the corresponding keyspace ``` RangeStreamer.java:606 - Output from RangeFetchMapCalculator for keyspace ``` this means the code it's stuck in `getRangeFetchMap();` ``` Multimap> rangeFetchMapMap = calculator.getRangeFetchMap(); logger.info("Output from RangeFetchMapCalculator for keyspace {}", keyspace); ``` Here is the cluster topology: * Cassandra version: 4.0.12 * # of nodes: 190 * Tokens (vnodes): 128 Initial hypothesis was that the graph calculation was taking longer due to the combination of nodes + tokens + tables but in the same cluster I see one of the node joined without any issues. wondering if I am hitting a bug causing it to work sometimes but get into an infinite loop some times? Please let me know if you need any other details and appreciate any pointers to debug this further. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18922) cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586
[ https://issues.apache.org/jira/browse/CASSANDRA-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1523#comment-1523 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18922: --- Thanks, I am trying to identify the impact of CVE-2023-4586 on the cassandra server and if that affects 3.x and 4.x versions of cassandra. From [CASSANDRA-18812|https://issues.apache.org/jira/browse/CASSANDRA-18812?focusedCommentId=17760806&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17760806] I see that the server is impacted but doesn't have more details on why only the trunk and 5.0 are impacted and not 3.x and 4.x. > cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586 > - > > Key: CASSANDRA-18922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18922 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > This is failing OWASP: https://nvd.nist.gov/vuln/detail/CVE-2023-4586 > but appears to be a false positive. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18922) cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586
[ https://issues.apache.org/jira/browse/CASSANDRA-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1490#comment-1490 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18922: --- thanks [~brandon.williams] can you provide some more details as to why only 5.0 is impacted? Reading through the comments it looks like we need to set the hostname verification by default. Is this derived from the `server_encryption_option`: `require_endpoint_verification` > cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586 > - > > Key: CASSANDRA-18922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18922 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > This is failing OWASP: https://nvd.nist.gov/vuln/detail/CVE-2023-4586 > but appears to be a false positive. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18922) cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586
[ https://issues.apache.org/jira/browse/CASSANDRA-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1463#comment-1463 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18922: --- [~brandon.williams] Thanks for reporting the vulnerability. I see the Fix version is marked as 5.x, do we have any timelines to back port the hostname validation to the 3.x or 4.x branches? > cassandra-driver-core-3.11.5 vulnerability: CVE-2023-4586 > - > > Key: CASSANDRA-18922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18922 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > This is failing OWASP: https://nvd.nist.gov/vuln/detail/CVE-2023-4586 > but appears to be a false positive. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18875) Upgrade the snakeyaml library version
[ https://issues.apache.org/jira/browse/CASSANDRA-18875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768069#comment-17768069 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18875: --- Thanks [~brandon.williams] the current version is 4.x which is in active use for most of the industry, can we update in the future 4.x release? I can send a PR for it. > Upgrade the snakeyaml library version > - > > Key: CASSANDRA-18875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18875 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Fix For: 5.x > > > Apache cassandra uses 1.26 version of snakeyaml dependency and there are > several > [vulnerabilities|https://mvnrepository.com/artifact/org.yaml/snakeyaml/1.26#] > in this version that can be fixed by upgrading to 2.x version. I understand > that this is not security issue as cassandra already uses SafeConstructor and > is not a vulnerability under OWASP, so there are no plans to fix it as per > CASSANDRA-18122 > > Cassandra as a open source used and distributed by many enterprise customers > and also when downloading cassandra as tar and using it external scanners are > not aware of the implementation of SafeConstructor have no idea if it's > vulnerable or not. > Can we consider upgrading the version to 2.x in the next releases as > snakeyaml is not something that has a large dependency between the major and > minor versions. I am happy to open a PR for this. Please let me know your > thoughts on this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-18875) Upgrade the snakeyaml library version
Jai Bheemsen Rao Dhanwada created CASSANDRA-18875: - Summary: Upgrade the snakeyaml library version Key: CASSANDRA-18875 URL: https://issues.apache.org/jira/browse/CASSANDRA-18875 Project: Cassandra Issue Type: Task Reporter: Jai Bheemsen Rao Dhanwada Apache cassandra uses 1.26 version of snakeyaml dependency and there are several [vulnerabilities|https://mvnrepository.com/artifact/org.yaml/snakeyaml/1.26#] in this version that can be fixed by upgrading to 2.x version. I understand that this is not security issue as cassandra already uses SafeConstructor and is not a vulnerability under OWASP, so there are no plans to fix it as per CASSANDRA-18122 Cassandra as a open source used and distributed by many enterprise customers and also when downloading cassandra as tar and using it external scanners are not aware of the implementation of SafeConstructor have no idea if it's vulnerable or not. Can we consider upgrading the version to 2.x in the next releases as snakeyaml is not something that has a large dependency between the major and minor versions. I am happy to open a PR for this. Please let me know your thoughts on this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18122) when will slf4j be upgraded for CVE-2018-8088
[ https://issues.apache.org/jira/browse/CASSANDRA-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648267#comment-17648267 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-18122: --- thanks [~brandon.williams] I believe it's because C* is using 1.7.25 version of slf4j the scanner detect the vulnerability: [https://github.com/apache/cassandra/blob/cassandra-4.1.0/build.xml#L534-L536] are there any plans upgrade this version in the next release? > when will slf4j be upgraded for CVE-2018-8088 > - > > Key: CASSANDRA-18122 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18122 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello Team, > I see Cassandra 4.1 GA'ed on 12/13/2022 and still uses 1.7.25 version of > slf4j and the vulnerability: [https://nvd.nist.gov/vuln/detail/CVE-2018-8088] > is fixed only in the 1.7.26 version. Do we have any details on when the slf4j > be upgraded ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-18122) when will slf4j be upgraded for CVE-2018-8088
Jai Bheemsen Rao Dhanwada created CASSANDRA-18122: - Summary: when will slf4j be upgraded for CVE-2018-8088 Key: CASSANDRA-18122 URL: https://issues.apache.org/jira/browse/CASSANDRA-18122 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Hello Team, I see Cassandra 4.1 GA'ed on 12/13/2022 and still uses 1.7.25 version of slf4j and the vulnerability: [https://nvd.nist.gov/vuln/detail/CVE-2018-8088] is fixed only in the 1.7.26 version. Do we have any details on when the slf4j be upgraded ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16670) Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599183#comment-17599183 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16670: --- thanks for your response. Yes I figured out the issue by looking at the error message but my only confusion was why it started all of a sudden. The server was upgraded almost 10 days back but the error started only 2 days ago. > Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest > > > Key: CASSANDRA-16670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16670 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc2, 4.0, 4.1-alpha1, 4.1 > > > *ViewComplexTest* > Flaky > [test|https://ci-cassandra.apache.org/job/Cassandra-4.0/43/testReport/junit/org.apache.cassandra.cql3/ViewComplexTest/testPartialDeleteSelectedColumnWithoutFlush_3_/] > and move back away from 'long' section. > *InsertUpdateIfConditionTest* (CASSANDRA-16676) > Fails > [here|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/testListItem_2__clusterMinVersion_4_0_0_rc2_SNAPSHOT_/] > with a timeout. We can see in the history it takes quite a while in > [CI|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/history/] > _but_ it takes just 1m locally. Probably due to constrained resources. > Looking at the > [individual|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/] > test cases, for compression i.e., we can see 378 at an average of 1s each it > can easily go over the timeout of 240s. Recommendation is to either move to > 'long' section of to raise the timeout for the class for CI. > *ViewFilteringTest* > Move back from 'long' section -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16670) Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598748#comment-17598748 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16670: --- [~e.dimitrova] Were you able to find why the errors that you mentioned in the [comment|https://issues.apache.org/jira/browse/CASSANDRA-16670?focusedCommentId=17355084&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17355084] happened? We see a similar errors where the driver version is 4.13.0 and the server version is 4.0.5. > Flaky ViewComplexTest, ViewFilteringTest and InsertUpdateIfConditionTest > > > Key: CASSANDRA-16670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16670 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc2, 4.0, 4.1-alpha1, 4.1 > > > *ViewComplexTest* > Flaky > [test|https://ci-cassandra.apache.org/job/Cassandra-4.0/43/testReport/junit/org.apache.cassandra.cql3/ViewComplexTest/testPartialDeleteSelectedColumnWithoutFlush_3_/] > and move back away from 'long' section. > *InsertUpdateIfConditionTest* (CASSANDRA-16676) > Fails > [here|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/testListItem_2__clusterMinVersion_4_0_0_rc2_SNAPSHOT_/] > with a timeout. We can see in the history it takes quite a while in > [CI|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/history/] > _but_ it takes just 1m locally. Probably due to constrained resources. > Looking at the > [individual|https://ci-cassandra.apache.org/job/Cassandra-4.0/46/testReport/junit/org.apache.cassandra.cql3.validation.operations/InsertUpdateIfConditionTest/] > test cases, for compression i.e., we can see 378 at an average of 1s each it > can easily go over the timeout of 240s. Recommendation is to either move to > 'long' section of to raise the timeout for the class for CI. > *ViewFilteringTest* > Move back from 'long' section -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584057#comment-17584057 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-14715 at 8/24/22 4:15 PM: [~stefan.miklosovic] do you have any estimate if this will be released anytime sooner? thank you was (Author: jaid): [~stefan.miklosovic] do you have any estimate if this will be released anything sooner? thank you > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584057#comment-17584057 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14715: --- [~stefan.miklosovic] do you have any estimate if this will be released anything sooner? thank you > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542210#comment-17542210 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14715: --- Any plans to fix this in the upcoming versions or atleast 4.0.x version? the error message is quite mis-leading. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16940) Confusing ProtocolException msg Invalid or unsupported protocol version (4)
[ https://issues.apache.org/jira/browse/CASSANDRA-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537251#comment-17537251 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16940: --- thanks [~brandon.williams] I checked all the comments and it looks like this entry with null version happens if Cassandra nodes are running with [separate IP addresses for listen and broadcast address.|https://issues.apache.org/jira/browse/CASSANDRA-16518?focusedCommentId=17425372&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17425372]. In my case I am running my cassandra cluster in Kubernetes and I am only setting listen_address as the Pod IP address and broadcast address is same as listen. Not sure if I am running into [CASSANDRA-16518|https://issues.apache.org/jira/browse/CASSANDRA-16518]. > Confusing ProtocolException msg Invalid or unsupported protocol version (4) > --- > > Key: CASSANDRA-16940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16940 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Brad Schoening >Priority: Normal > > The following warning was seen frequently after upgrading from 3.0.15 to > 3.11.11 in the cassandra.log: > {noformat} > ProtocolException: Invalid or unsupported protocol version (4); supported > versions are (3/v3, 4/v4, 5/v5-beta){noformat} > It is at best unclear, or maybe a bug in the code throwing this exception > stating version '4' not supported but 4/v4 is. > from org/apache/cassandra/transport/ProtocolVersion.java > public static String invalidVersionMessage(int version) > { return String.format("Invalid or unsupported protocol version (%d); > supported versions are (%s)", version, String.join(", ", > ProtocolVersion.supportedVersions())); } > We later found invalid IP addresses in the system.peers table and once > removed, this exception went away. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16940) Confusing ProtocolException msg Invalid or unsupported protocol version (4)
[ https://issues.apache.org/jira/browse/CASSANDRA-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536804#comment-17536804 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16940 at 5/13/22 6:07 PM: I had a similar issue and when I started looking at code to see why it is happening. {code:java} boolean enforceV3Cap = SystemKeyspace.loadPeerVersions() .values() .stream() .anyMatch(v -> v.compareTo(MIN_VERSION_FOR_V4) < 0); {code} Inspecting the system.peers table, one of the node in the cluster has a null entry, which is causing the specific node to cap the max negotiable version to V3. {code:java} > select peer, release_version from system.peers; peer | release_version --+ 10.41.128.35 | 3.11.9 10.41.128.228 | null 10.41.128.99 | 3.11.9 (3 rows) {code} However, I am not sure why there is a null entry in the peers table. Also I checked the {noformat} nodetool status{noformat} and {noformat} nodetool describecluster{noformat} and I don't see this specific IP present. Not sure if there is a bug that is causing this? was (Author: jaid): I had a similar issue and when I started looking at code to see why it is happening. ``` boolean enforceV3Cap = SystemKeyspace.loadPeerVersions() .values() .stream() .anyMatch(v -> v.compareTo(MIN_VERSION_FOR_V4) < 0); ``` Inspecting the system.peers table, one of the node in the cluster has a null entry, which is causing the specific node to cap the max negotiable version to V3. ``` > select peer, release_version from system.peers; peer | release_version ---+- 10.41.128.35 | 3.11.9 10.41.128.228 | null 10.41.128.99 | 3.11.9 (3 rows) ``` However, I am not sure why there is a null entry in the peers table. Also I checked the `nodetool status` and `nodetool describecluster` and I don't see this specific IP present. Not sure if there is a bug that is causing this? > Confusing ProtocolException msg Invalid or unsupported protocol version (4) > --- > > Key: CASSANDRA-16940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16940 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Brad Schoening >Priority: Normal > > The following warning was seen frequently after upgrading from 3.0.15 to > 3.11.11 in the cassandra.log: > {noformat} > ProtocolException: Invalid or unsupported protocol version (4); supported > versions are (3/v3, 4/v4, 5/v5-beta){noformat} > It is at best unclear, or maybe a bug in the code throwing this exception > stating version '4' not supported but 4/v4 is. > from org/apache/cassandra/transport/ProtocolVersion.java > public static String invalidVersionMessage(int version) > { return String.format("Invalid or unsupported protocol version (%d); > supported versions are (%s)", version, String.join(", ", > ProtocolVersion.supportedVersions())); } > We later found invalid IP addresses in the system.peers table and once > removed, this exception went away. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16940) Confusing ProtocolException msg Invalid or unsupported protocol version (4)
[ https://issues.apache.org/jira/browse/CASSANDRA-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536804#comment-17536804 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16940: --- I had a similar issue and when I started looking at code to see why it is happening. ``` boolean enforceV3Cap = SystemKeyspace.loadPeerVersions() .values() .stream() .anyMatch(v -> v.compareTo(MIN_VERSION_FOR_V4) < 0); ``` Inspecting the system.peers table, one of the node in the cluster has a null entry, which is causing the specific node to cap the max negotiable version to V3. ``` > select peer, release_version from system.peers; peer | release_version ---+- 10.41.128.35 | 3.11.9 10.41.128.228 | null 10.41.128.99 | 3.11.9 (3 rows) ``` However, I am not sure why there is a null entry in the peers table. Also I checked the `nodetool status` and `nodetool describecluster` and I don't see this specific IP present. Not sure if there is a bug that is causing this? > Confusing ProtocolException msg Invalid or unsupported protocol version (4) > --- > > Key: CASSANDRA-16940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16940 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Brad Schoening >Priority: Normal > > The following warning was seen frequently after upgrading from 3.0.15 to > 3.11.11 in the cassandra.log: > {noformat} > ProtocolException: Invalid or unsupported protocol version (4); supported > versions are (3/v3, 4/v4, 5/v5-beta){noformat} > It is at best unclear, or maybe a bug in the code throwing this exception > stating version '4' not supported but 4/v4 is. > from org/apache/cassandra/transport/ProtocolVersion.java > public static String invalidVersionMessage(int version) > { return String.format("Invalid or unsupported protocol version (%d); > supported versions are (%s)", version, String.join(", ", > ProtocolVersion.supportedVersions())); } > We later found invalid IP addresses in the system.peers table and once > removed, this exception went away. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception
[ https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:08 PM: - I am running into this issue as well and have couple of questions. # {{I understand that Index file is {*}SSTable Index which maps row keys to their respective offsets in the Data file{*}, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? was (Author: jaid): I am running into this issue as well and have couple of questions. # {{I understand that Index file is SSTable Index which maps row keys to their respective offsets in the Data file, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? > Compaction repeatedly fails validateReallocation exception > -- > > Key: CASSANDRA-16764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16764 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction/LCS >Reporter: Richard Hesse >Priority: Normal > > I have a few nodes in my ring that are stuck repeatedly trying to compact the > same tables over and over again. I've run through the usual trick of rolling > restarts, and it doesn't seem to help. This exception is logged on the nodes: > {code} > ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 > - Exception in thread Thread[CompactionExecutor:6,1,main] > java.lang.RuntimeException: null > at > org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173)
[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception
[ https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:08 PM: - I am running into this issue as well and have couple of questions. # {{I understand that Index file is SSTable Index which maps row keys to their respective offsets in the Data file, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? was (Author: jaid): I am running into this issue as well and have couple of questions. # {{I understand that Index file is SSTable Index which maps row keys to their respective offsets in the Data file, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? > Compaction repeatedly fails validateReallocation exception > -- > > Key: CASSANDRA-16764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16764 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction/LCS >Reporter: Richard Hesse >Priority: Normal > > I have a few nodes in my ring that are stuck repeatedly trying to compact the > same tables over and over again. I've run through the usual trick of rolling > restarts, and it doesn't seem to help. This exception is logged on the nodes: > {code} > ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 > - Exception in thread Thread[CompactionExecutor:6,1,main] > java.lang.RuntimeException: null > at > org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173) > ~[a
[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception
[ https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:08 PM: - I am running into this issue as well and have couple of questions. # {{I understand that Index file is SSTable Index which maps row keys to their respective offsets in the Data file, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? was (Author: jaid): I am running into this issue as well and have couple of questions. # {{I understand that Index file is SSTable Index which maps row keys to their respective offsets in the Data file, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? > Compaction repeatedly fails validateReallocation exception > -- > > Key: CASSANDRA-16764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16764 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction/LCS >Reporter: Richard Hesse >Priority: Normal > > I have a few nodes in my ring that are stuck repeatedly trying to compact the > same tables over and over again. I've run through the usual trick of rolling > restarts, and it doesn't seem to help. This exception is logged on the nodes: > {code} > ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 > - Exception in thread Thread[CompactionExecutor:6,1,main] > java.lang.RuntimeException: null > at > org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173) > ~[a
[jira] [Commented] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception
[ https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16764: --- I am running into this issue as well and have couple of questions. # I understand that Index file is `SSTable Index which maps row keys to their respective offsets in the Data file`, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ? # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? > Compaction repeatedly fails validateReallocation exception > -- > > Key: CASSANDRA-16764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16764 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction/LCS >Reporter: Richard Hesse >Priority: Normal > > I have a few nodes in my ring that are stuck repeatedly trying to compact the > same tables over and over again. I've run through the usual trick of rolling > restarts, and it doesn't seem to help. This exception is logged on the nodes: > {code} > ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 > - Exception in thread Thread[CompactionExecutor:6,1,main] > java.lang.RuntimeException: null > at > org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:136) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:98) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:143) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:204) > ~[apa
[jira] [Comment Edited] (CASSANDRA-16764) Compaction repeatedly fails validateReallocation exception
[ https://issues.apache.org/jira/browse/CASSANDRA-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513546#comment-17513546 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-16764 at 3/28/22, 6:07 PM: - I am running into this issue as well and have couple of questions. # {{I understand that Index file is SSTable Index which maps row keys to their respective offsets in the Data file, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ?}} # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? was (Author: jaid): I am running into this issue as well and have couple of questions. # I understand that Index file is `SSTable Index which maps row keys to their respective offsets in the Data file`, but does this holds the actual Row Key and since the compaction is failing, even the tombstoned rows are not getting cleared from the Index files and the actual data files (Data.db) ? # Is there a configuration/limit where we can increase this limit from 2GB to a higher value? # In worst case, what is the harm or loss when we delete these files? > Compaction repeatedly fails validateReallocation exception > -- > > Key: CASSANDRA-16764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16764 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction/LCS >Reporter: Richard Hesse >Priority: Normal > > I have a few nodes in my ring that are stuck repeatedly trying to compact the > same tables over and over again. I've run through the usual trick of rolling > restarts, and it doesn't seem to help. This exception is logged on the nodes: > {code} > ERROR [CompactionExecutor:6] 2021-06-25 20:28:30,001 CassandraDaemon.java:244 > - Exception in thread Thread[CompactionExecutor:6,1,main] > java.lang.RuntimeException: null > at > org.apache.cassandra.io.util.DataOutputBuffer.validateReallocation(DataOutputBuffer.java:134) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.calculateNewSize(DataOutputBuffer.java:152) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.expandToFit(DataOutputBuffer.java:159) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:119) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:426) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serializeValuesWithoutSize(ClusteringPrefix.java:323) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:131) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.serialize(ClusteringPrefix.java:266) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:167) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.Serializers$NewFormatSerializer.serialize(Serializers.java:154) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:103) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.IndexInfo$Serializer.serialize(IndexInfo.java:82) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.addIndexBlock(ColumnIndex.java:216) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at org.apache.cassandra.db.ColumnIndex.add(ColumnIndex.java:264) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:111) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:173) > ~[apa
[jira] [Updated] (CASSANDRA-17355) Performance degradation when the data size grows
[ https://issues.apache.org/jira/browse/CASSANDRA-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-17355: -- Summary: Performance degradation when the data size grows (was: Performance degradation with Counter tables when the data size grows) > Performance degradation when the data size grows > > > Key: CASSANDRA-17355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17355 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello Everyone, > I am noticing a huge perf drop (spike in latency and CPU utilization) for the > counter type tables when the data size grows. To better understand/simulate I > have done the following perf test with `cassandra-stress` instead of my > use-case and I can reproduce the performance issue consistently. When using > the counter type tables, when the datasize grows the read latency and cpu > spikes to very high number. > > *Test Setup:* > # Setup a cluster with 3 nodes. > # Run a test with cassandra-stress and I see the latency and CPU are okay > without much spike. > # Send a lot of counter traffic using `cassandra-stress` tool (Replication > Factory = 3) > # Now the data size on the cluster is ~300G. > # Now run another test with cassandra-stress with 3:1 read write mixed > workload. > # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the > latency reaches ~1 seconds (which earlier was < 5ms). > # Another interesting observation is the disk reads goes to a higher number > and it keeps going higher with the increase in the disk size. > # It pretty much looked like a disk bottleneck issue but the same result > shows very low disk reads, cpu, latency with less amount of data. > # Below is the configuration I have used for testing this. > > {quote}C* Version: 3.11.9 > CPU: 16 > Memory: 64G > Heap: 16G > GC: G1GC > Disk: 500G GCP Persistent disk > > {quote} > I understand that, with growth in disk the number of lookup grows high, but > this looked to be a big performance drop. > Please let me know if you need more details. Also let me know this is known > limitation with the counter type and if there is a work around. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17355) Performance degradation with Counter tables when the data size grows
[ https://issues.apache.org/jira/browse/CASSANDRA-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488484#comment-17488484 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-17355: --- I will reach out to the community user group but the issue i see here is the performance drops as soon as the data size grows. Is that expected? > Performance degradation with Counter tables when the data size grows > > > Key: CASSANDRA-17355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17355 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello Everyone, > I am noticing a huge perf drop (spike in latency and CPU utilization) for the > counter type tables when the data size grows. To better understand/simulate I > have done the following perf test with `cassandra-stress` instead of my > use-case and I can reproduce the performance issue consistently. When using > the counter type tables, when the datasize grows the read latency and cpu > spikes to very high number. > > *Test Setup:* > # Setup a cluster with 3 nodes. > # Run a test with cassandra-stress and I see the latency and CPU are okay > without much spike. > # Send a lot of counter traffic using `cassandra-stress` tool (Replication > Factory = 3) > # Now the data size on the cluster is ~300G. > # Now run another test with cassandra-stress with 3:1 read write mixed > workload. > # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the > latency reaches ~1 seconds (which earlier was < 5ms). > # Another interesting observation is the disk reads goes to a higher number > and it keeps going higher with the increase in the disk size. > # It pretty much looked like a disk bottleneck issue but the same result > shows very low disk reads, cpu, latency with less amount of data. > # Below is the configuration I have used for testing this. > > {quote}C* Version: 3.11.9 > CPU: 16 > Memory: 64G > Heap: 16G > GC: G1GC > Disk: 500G GCP Persistent disk > > {quote} > I understand that, with growth in disk the number of lookup grows high, but > this looked to be a big performance drop. > Please let me know if you need more details. Also let me know this is known > limitation with the counter type and if there is a work around. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17355) Performance degradation with Counter tables when the data size grows
[ https://issues.apache.org/jira/browse/CASSANDRA-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-17355: -- Description: Hello Everyone, I am noticing a huge perf drop (spike in latency and CPU utilization) for the counter type tables when the data size grows. To better understand/simulate I have done the following perf test with `cassandra-stress` instead of my use-case and I can reproduce the performance issue consistently. When using the counter type tables, when the datasize grows the read latency and cpu spikes to very high number. *Test Setup:* # Setup a cluster with 3 nodes. # Run a test with cassandra-stress and I see the latency and CPU are okay without much spike. # Send a lot of counter traffic using `cassandra-stress` tool (Replication Factory = 3) # Now the data size on the cluster is ~300G. # Now run another test with cassandra-stress with 3:1 read write mixed workload. # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the latency reaches ~1 seconds (which earlier was < 5ms). # Another interesting observation is the disk reads goes to a higher number and it keeps going higher with the increase in the disk size. # It pretty much looked like a disk bottleneck issue but the same result shows very low disk reads, cpu, latency with less amount of data. # Below is the configuration I have used for testing this. {quote}C* Version: 3.11.9 CPU: 16 Memory: 64G Heap: 16G GC: G1GC Disk: 500G GCP Persistent disk {quote} I understand that, with growth in disk the number of lookup grows high, but this looked to be a big performance drop. Please let me know if you need more details. Also let me know this is known limitation with the counter type and if there is a work around. was: Hello Everyone, I am noticing a huge perf drop (spike in latency and CPU utilization) for the counter type tables when the data size grows. To better understand/simulate I have done the following perf test with `cassandra-stress` instead of my use-case and I can reproduce the performance issue consistently. When using the counter type tables, when the datasize grows the read latency and cpu spikes to very high number. *Test Setup:* # Setup a cluster with 3 nodes. # Run a test with cassandra-stress and I see the latency and CPU are okay without much spike. # Send a lot of counter traffic using `cassandra-stress` tool (Replication Factory = 3) # Now the data size on the cluster is ~300G. # Now run another test with cassandra-stress with 3:1 read write mixed workload. # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the latency reaches ~1 seconds (which earlier was < 5ms). # Another interesting observation is the disk reads goes to a higher number and it keeps going higher with the increase in the disk size. # It pretty much looked like a disk bottleneck issue but the same result shows very low disk reads, cpu, latency with less amount of data. # Below is the configuration I have used for testing this. ``` C* Version: 3.11.9 CPU: 16 Memory: 64G Heap: 16G GC: G1GC Disk: 500G GCP Persistent disk ``` I understand that, with growth in disk the number of lookup grows high, but this looked to be a big performance drop. Please let me know if you need more details. Also let me know this is known limitation with the counter type and if there is a work around. > Performance degradation with Counter tables when the data size grows > > > Key: CASSANDRA-17355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17355 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello Everyone, > I am noticing a huge perf drop (spike in latency and CPU utilization) for the > counter type tables when the data size grows. To better understand/simulate I > have done the following perf test with `cassandra-stress` instead of my > use-case and I can reproduce the performance issue consistently. When using > the counter type tables, when the datasize grows the read latency and cpu > spikes to very high number. > > *Test Setup:* > # Setup a cluster with 3 nodes. > # Run a test with cassandra-stress and I see the latency and CPU are okay > without much spike. > # Send a lot of counter traffic using `cassandra-stress` tool (Replication > Factory = 3) > # Now the data size on the cluster is ~300G. > # Now run another test with cassandra-stress with 3:1 read write mixed > workload. > # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the > latency reaches ~1 seconds (which earlier was < 5ms). > # Another interesting observation is the disk reads goes to a higher numbe
[jira] [Created] (CASSANDRA-17355) Performance degradation with Counter tables when the data size grows
Jai Bheemsen Rao Dhanwada created CASSANDRA-17355: - Summary: Performance degradation with Counter tables when the data size grows Key: CASSANDRA-17355 URL: https://issues.apache.org/jira/browse/CASSANDRA-17355 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Hello Everyone, I am noticing a huge perf drop (spike in latency and CPU utilization) for the counter type tables when the data size grows. To better understand/simulate I have done the following perf test with `cassandra-stress` instead of my use-case and I can reproduce the performance issue consistently. When using the counter type tables, when the datasize grows the read latency and cpu spikes to very high number. *Test Setup:* # Setup a cluster with 3 nodes. # Run a test with cassandra-stress and I see the latency and CPU are okay without much spike. # Send a lot of counter traffic using `cassandra-stress` tool (Replication Factory = 3) # Now the data size on the cluster is ~300G. # Now run another test with cassandra-stress with 3:1 read write mixed workload. # At this point I see the CPU spikes to double (32 on a 16 core CPU) and the latency reaches ~1 seconds (which earlier was < 5ms). # Another interesting observation is the disk reads goes to a higher number and it keeps going higher with the increase in the disk size. # It pretty much looked like a disk bottleneck issue but the same result shows very low disk reads, cpu, latency with less amount of data. # Below is the configuration I have used for testing this. ``` C* Version: 3.11.9 CPU: 16 Memory: 64G Heap: 16G GC: G1GC Disk: 500G GCP Persistent disk ``` I understand that, with growth in disk the number of lookup grows high, but this looked to be a big performance drop. Please let me know if you need more details. Also let me know this is known limitation with the counter type and if there is a work around. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14328) Invalid metadata has been detected for role
[ https://issues.apache.org/jira/browse/CASSANDRA-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486617#comment-17486617 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14328: --- [~tania.en...@quest.com] [~prnvjndl] were you able to figure out why the node got into this state? it looks like this is a side effect of bootstrapping process not streaming all the data (https://issues.apache.org/jira/browse/CASSANDRA-14006). I see a similar issue and just wondering if you have any more details and workaround on this issue? > Invalid metadata has been detected for role > --- > > Key: CASSANDRA-14328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14328 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Pranav Jindal >Priority: Normal > > Cassandra Version : 3.10 > One node was replaced and was successfully up and working but CQL-SH fails > with error. > > CQL-SH error: > > {code:java} > Connection error: ('Unable to connect to any servers', {'10.180.0.150': > AuthenticationFailed('Failed to authenticate to 10.180.0.150: Error from > server: code= [Server error] message="java.lang.RuntimeException: Invalid > metadata has been detected for role utorjwcnruzzlzafxffgyqmlvkxiqcgb"',)}) > {code} > > Cassandra server ERROR: > {code:java} > WARN [Native-Transport-Requests-1] 2018-03-20 13:37:17,894 > CassandraRoleManager.java:96 - An invalid value has been detected in the > roles table for role utorjwcnruzzlzafxffgyqmlvkxiqcgb. If you are unable to > login, you may need to disable authentication and confirm that values in that > table are accurate > ERROR [Native-Transport-Requests-1] 2018-03-20 13:37:17,895 Message.java:623 > - Unexpected exception during request; channel = [id: 0xdfc3604f, > L:/10.180.0.150:9042 - R:/10.180.0.150:51668] > java.lang.RuntimeException: Invalid metadata has been detected for role > utorjwcnruzzlzafxffgyqmlvkxiqcgb > at > org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:99) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:82) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:528) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:503) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.auth.CassandraRoleManager.canLogin(CassandraRoleManager.java:310) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.service.ClientState.login(ClientState.java:271) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:80) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517) > [apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410) > [apache-cassandra-3.10.jar:3.10] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.39.Final.jar:4.0.39.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366) > [netty-all-4.0.39.Final.jar:4.0.39.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35) > [netty-all-4.0.39.Final.jar:4.0.39.Final] > at > io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357) > [netty-all-4.0.39.Final.jar:4.0.39.Final] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_121] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > [apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > [apache-cassandra-3.10.jar:3.10] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > Caused by: java.lang.NullPointerException: null > at > org.apache.cassandra.cql3.UntypedResultSet$Row.getBoolean(UntypedResultSet.java:273) > ~[apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:88) > ~[apache-cassandra-3.10.jar:3.10] > ... 16 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15155) Bootstrapping node finishes 'successfully' before schema synced, data not streamed
[ https://issues.apache.org/jira/browse/CASSANDRA-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486218#comment-17486218 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15155: --- [~cowardlydragon] just curious, were able to narrow down this issue to see why it happened? > Bootstrapping node finishes 'successfully' before schema synced, data not > streamed > -- > > Key: CASSANDRA-15155 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15155 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Consistency/Bootstrap and > Decommission >Reporter: Constance Eustace >Priority: Normal > Attachments: debug.log.zip > > > Bootstrap of a new node to expand an existing cluster is completing the > bootstrapping "successfully", joining the cluster as UN in nodetool status, > but has no data and no active streams. Writes and reads start being served. > Environment: AWS, IPV6, three datacenters, asia / europe / us > Version: 2.2.13 > We have previously scaled the europe and us datacenters from 5 nodes to 25 > nodes (one node at a time) without incident. > Asia (tokyo) is a different story. We have seen multiple failure scenarios, > but the most troubling is a string of attempted node bootstrappings where the > bootstrap completes and the node joins the ring, but there is no data. > We were able to expand Asia by four nodes by increasing ring delay to 100 > seconds, but that has not worked anymore. > Attached Log: Our autoscaling + Ansible initial setup creates the node, but > the ansible has not run yet, so the autostarted cassandra fails to load, but > it has no security group yet so it did not communicate with any other node. > That is the 15:15:XX series log messages at the very top. > Then 15:20:XX series messages begin after ansible has completed setup of the > box, and the data dirs and commit log dirs have been scrubbed. > This same process ran for EU and US expansions without incident. > From what I can tell of the log (DEBUG was enabled): > Ring information collection begins, so some sort of gossip/cluster > communication is healthy: > INFO [main] 2019-06-12 15:20:05,468 StorageService.java:1142 - JOINING: > waiting for ring information > But almost all of those checks output: > DEBUG [GossipStage:1] 2019-06-12 15:20:07,673 MigrationManager.java:96 - Not > pulling schema because versions match or shouldPullSchemaFrom returned false > Which seems weird, as we shall see soon. After all the nodes have reported in > a similar way, most not pulling because of the above message, and a couple > that were interpreted as DOWN, it then does: > INFO [main] 2019-06-12 15:21:45,486 StorageService.java:1142 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2019-06-12 15:21:45,487 StorageService.java:1142 - JOINING: > waiting for pending range calculation > INFO [main] 2019-06-12 15:21:45,487 StorageService.java:1142 - JOINING: > calculation complete, ready to bootstrap > We then get a huge number of > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=dd5d7fa0-1e50-11e9-a62d-0d41c97b2404 > INFO [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING: > Starting to bootstrap... > INFO [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream > #05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for Bootstrap > INFO [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap > completed! for the tokens [-7314981925085449175, .. > 5499447097629838103] > Here are the log messages for MIgrationManager for schema gossiping: > DEBUG [main] 2019-06-12 15:20:05,423 MigrationManager.java:493 - Gossiping my > schema version 59adb24e-f3cd-3e02-97f0-5b395827453f > DEBUG [MigrationStage:1] 2019-06-12 15:23:25,694 MigrationManager.java:493 - > Gossiping my schema version 3d1a9d9e-1120-37ae-abe0-e064cd147a99 > DEBUG [MigrationStage:1] 2019-06-12 15:23:25,775 MigrationManager.java:493 - > Gossiping my schema version 0bf74f5a-4b39-3525-b217-e9ccf7a1b6cb > DEBUG [MigrationStage:1] 2019-06-12 15:23:25,905 MigrationManager.java:493 - > Gossiping my schema version b145475a-02dc-370c-8af7-a9aba2d61362 > DEBUG [InternalResponseStage:12] 2019-06-12 15:24:26,445 > MigrationManager.java:493 - Gossiping my schema version > 9c2ed14a-8db5-39b3-af48-6cdb5463c772 > the schema version ending in -6cdb5463c772 is the proper version in the other > nodes per gossipinfo. But as can be seen, the bootstrap completion message > (15:23:25,526) is logged before four or five intermediate schema versions are > created, which seem to be due to system_distributed and other keyspaces being > created. > The Bootstrap completed! message comes from
[jira] [Commented] (CASSANDRA-16617) DOC - Publish drivers compatibility matrix
[ https://issues.apache.org/jira/browse/CASSANDRA-16617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390838#comment-17390838 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16617: --- Thank you [~flightc] I was mainly interested in updating the document for Cassandra 4.0 and Java driver Versions: [https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html] Now that 4.0 is officially GA :) > DOC - Publish drivers compatibility matrix > -- > > Key: CASSANDRA-16617 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16617 > Project: Cassandra > Issue Type: Task > Components: Documentation/Website >Reporter: Erick Ramirez >Assignee: Erick Ramirez >Priority: Normal > > [~jaid] brought up [on the User > ML|https://lists.apache.org/thread.html/r41ab9448a8af2e95996577c82bd5a9ca09e308bc79917b15fad45580%40%3Cuser.cassandra.apache.org%3E] > whether a compatibility matrix exists on the Apache Cassandra website > similar to [the Java driver matrix published on the DataStax Docs > website|https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html]. > I've logged this ticket to consider including the compatibility matrix from > the following _supported_ drivers: > * Java driver > * Python driver > * Node.js driver > * C# driver driver > * C++ driver > My intention is to socialise this idea with the devs and provided there are > no objections, I'll look into implementing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364421#comment-17364421 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874: --- Thanks you [~brandon.williams] unfortunately, I am failing to reproduce this issue. so I am trying to see if there are any open/known issues that could cause this. Since I moved from 3.11.6 to 3.11.9 and it still happend, so not sure if there will be any success with 3.11.10. Do you have any recommendations or pointers to identify the issue? > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364387#comment-17364387 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874: --- [~brandon.williams] I have noticed the same behavior even on the 3.11.9 version of Cassandra. Is there any other race condition that could cause this issue? > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16465) Increased Read Latency With Cassandra >= 3.11.7
[ https://issues.apache.org/jira/browse/CASSANDRA-16465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332621#comment-17332621 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16465: --- Thank you [~nvest] [~AhmedElJAMI] I tried upgrading from 3.11.6 to 3.11.9 and didn't see any perf drop with my application use-case and also the cassandra-stress too. I use LCS in all my tables (system* tables uses default shipped with Cassandra). Do you mind sharing your test scenarios where you see latency, so I can make sure I am not missing anything. [~nvest] please share your finding with 3.11.10 as well, so it will be useful for me to assess which version to upgrade. > Increased Read Latency With Cassandra >= 3.11.7 > --- > > Key: CASSANDRA-16465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16465 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Ahmed ELJAMI >Priority: Normal > Fix For: 3.11.x, 4.0.x > > Attachments: Screenshot from 2021-04-24 20-50-47.png > > > After upgrading Cassandra from 3.11.3 to 3.11.9, Cassandra read latency 99% > increased significantly. Getting back to 3.11.3 immediately fixed the issue. > I have observed "SStable reads" increases after upgrading to 3.11.9. > The same behavior was observed by some other users: > [https://www.mail-archive.com/user@cassandra.apache.org/msg61247.html] > According to Paulo Motta's comment, this behavior may be caused by > https://issues.apache.org/jira/browse/CASSANDRA-15690 which was introduced on > 3.11.7 and removed an optimization that may cause a correctness issue when > there are partition deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16465) Increased Read Latency With Cassandra >= 3.11.7
[ https://issues.apache.org/jira/browse/CASSANDRA-16465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331078#comment-17331078 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-16465: --- [~nvest] I am planning to upgrade the Cassandra version from 3.11.6 to 3.11.9 and I don't have any tables with TWCS. have you noticed this performance drop with all the Compaction Strategies or just TWCS? > Increased Read Latency With Cassandra >= 3.11.7 > --- > > Key: CASSANDRA-16465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16465 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Ahmed ELJAMI >Priority: Normal > Fix For: 3.11.11 > > > After upgrading Cassandra from 3.11.3 to 3.11.9, Cassandra read latency 99% > increased significantly. Getting back to 3.11.3 immediately fixed the issue. > I have observed "SStable reads" increases after upgrading to 3.11.9. > The same behavior was observed by some other users: > [https://www.mail-archive.com/user@cassandra.apache.org/msg61247.html] > According to Paulo Motta's comment, this behavior may be caused by > https://issues.apache.org/jira/browse/CASSANDRA-15690 which was introduced on > 3.11.7 and removed an optimization that may cause a correctness issue when > there are partition deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16628) Logging bug during the node replacement and token assignment
Jai Bheemsen Rao Dhanwada created CASSANDRA-16628: - Summary: Logging bug during the node replacement and token assignment Key: CASSANDRA-16628 URL: https://issues.apache.org/jira/browse/CASSANDRA-16628 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Hello Team, I noticed a minor logging issue when a Cassandra node is trying to boot-up with a new IP address but the existing data directory. The IP address and Token fields are inter-changed. *Sample Log:* {{WARN [GossipStage:1] 2021-04-23 18:24:06,348 StorageService.java:2425 - Not updating host ID 27031833-5141-46e0-b032-bef67137ae49 for /10.24.3.9 because it's mine}} {{INFO [GossipStage:1] 2021-04-23 18:24:06,349 StorageService.java:2356 - Nodes () and /10.24.3.9 have the same token /10.24.3.10. Ignoring -1124147225848710462}} {{INFO [GossipStage:1] 2021-04-23 18:24:06,350 StorageService.java:2356 - Nodes () and /10.24.3.9 have the same token /10.24.3.10. Ignoring -1239985462983206335}} *Steps to Reproduce:* Replace a Cassandra node with the a new IP address with the same data directory and the logs should show the messages. *Cassandra Version*: 3.11.6 Please let me know if you need more details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15592) IllegalStateException in gossip after removing node
[ https://issues.apache.org/jira/browse/CASSANDRA-15592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152387#comment-17152387 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15592: --- Hello [~brandon.williams] I ran into the similar Exception, is there any impact of this ERROR or this is just more of logging problem? in my tests I didn't see any impact to the cluster operations. so I would like to know the impact of this before even attempting to upgrade in production > IllegalStateException in gossip after removing node > --- > > Key: CASSANDRA-15592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Normal > Fix For: 3.0.21, 3.11.7, 4.0, 4.0-alpha4 > > > In one of our test environments we encountered the following exception: > {noformat} > 2020-02-02T10:50:13.276+0100 [GossipTasks:1] ERROR > o.a.c.u.NoSpamLogger$NoSpamLogStatement:97 log > java.lang.IllegalStateException: Attempting gossip state mutation from > illegal thread: GossipTasks:1 > at > org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178) > at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465) > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895) > at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78) > at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > java.lang.IllegalStateException: Attempting gossip state mutation from > illegal thread: GossipTasks:1 > at > org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178) > [apache-cassandra-3.11.5.jar:3.11.5] > at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465) > [apache-cassandra-3.11.5.jar:3.11.5] > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895) > [apache-cassandra-3.11.5.jar:3.11.5] > at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78) > [apache-cassandra-3.11.5.jar:3.11.5] > at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240) > [apache-cassandra-3.11.5.jar:3.11.5] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > [apache-cassandra-3.11.5.jar:3.11.5] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_231] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_231] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_231] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_231] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_231] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_231] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) > [apache-cassandra-3.11.5.jar:3.11.5] > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ~[netty-all-4.1.42.Final.jar:4.1.42.Final] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_231] > {noformat} > Since CASSANDRA-15059 we check that all state changes are performed in the > GossipStage but it seems like it was still performed in the "current" thread > [here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/gms/Gossiper.java#L895]. > It should be as simp
[jira] [Issue Comment Deleted] (CASSANDRA-8675) COPY TO/FROM broken for newline characters
[ https://issues.apache.org/jira/browse/CASSANDRA-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-8675: - Comment: was deleted (was: I tried the patch, but still running into the issue where if I look at the data with cqlsh I see a yellow '\n' after the import (literal) instead of purple '\n' (control character) ) > COPY TO/FROM broken for newline characters > -- > > Key: CASSANDRA-8675 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8675 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools > Environment: [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native > protocol v3] > Ubuntu 14.04 64-bit >Reporter: Lex Lythius >Priority: Normal > Labels: cqlsh, remove-reopen > Fix For: 3.0.x > > Attachments: CASSANDRA-8675.patch, copytest.csv > > > Exporting/importing does not preserve contents when texts containing newline > (and possibly other) characters are involved: > {code:sql} > cqlsh:test> create table if not exists copytest (id int primary key, t text); > cqlsh:test> insert into copytest (id, t) values (1, 'This has a newline > ... character'); > cqlsh:test> insert into copytest (id, t) values (2, 'This has a quote " > character'); > cqlsh:test> insert into copytest (id, t) values (3, 'This has a fake tab \t > character (typed backslash, t)'); > cqlsh:test> select * from copytest; > id | t > +- > 1 | This has a newline\ncharacter > 2 |This has a quote " character > 3 | This has a fake tab \t character (entered slash-t text) > (3 rows) > cqlsh:test> copy copytest to '/tmp/copytest.csv'; > 3 rows exported in 0.034 seconds. > cqlsh:test> copy copytest from '/tmp/copytest.csv'; > 3 rows imported in 0.005 seconds. > cqlsh:test> select * from copytest; > id | t > +--- > 1 | This has a newlinencharacter > 2 | This has a quote " character > 3 | This has a fake tab \t character (typed backslash, t) > (3 rows) > {code} > I tried replacing \n in the CSV file with \\n, which just expands to \n in > the table; and with an actual newline character, which fails with error since > it prematurely terminates the record. > It seems backslashes are only used to take the following character as a > literal > Until this is fixed, what would be the best way to refactor an old table with > a new, incompatible structure maintaining its content and name, since we > can't rename tables? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8675) COPY TO/FROM broken for newline characters
[ https://issues.apache.org/jira/browse/CASSANDRA-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152366#comment-17152366 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-8675: -- I tried the patch, but still running into the issue where if I look at the data with cqlsh I see a yellow '\n' after the import (literal) instead of purple '\n' (control character) > COPY TO/FROM broken for newline characters > -- > > Key: CASSANDRA-8675 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8675 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools > Environment: [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native > protocol v3] > Ubuntu 14.04 64-bit >Reporter: Lex Lythius >Priority: Normal > Labels: cqlsh, remove-reopen > Fix For: 3.0.x > > Attachments: CASSANDRA-8675.patch, copytest.csv > > > Exporting/importing does not preserve contents when texts containing newline > (and possibly other) characters are involved: > {code:sql} > cqlsh:test> create table if not exists copytest (id int primary key, t text); > cqlsh:test> insert into copytest (id, t) values (1, 'This has a newline > ... character'); > cqlsh:test> insert into copytest (id, t) values (2, 'This has a quote " > character'); > cqlsh:test> insert into copytest (id, t) values (3, 'This has a fake tab \t > character (typed backslash, t)'); > cqlsh:test> select * from copytest; > id | t > +- > 1 | This has a newline\ncharacter > 2 |This has a quote " character > 3 | This has a fake tab \t character (entered slash-t text) > (3 rows) > cqlsh:test> copy copytest to '/tmp/copytest.csv'; > 3 rows exported in 0.034 seconds. > cqlsh:test> copy copytest from '/tmp/copytest.csv'; > 3 rows imported in 0.005 seconds. > cqlsh:test> select * from copytest; > id | t > +--- > 1 | This has a newlinencharacter > 2 | This has a quote " character > 3 | This has a fake tab \t character (typed backslash, t) > (3 rows) > {code} > I tried replacing \n in the CSV file with \\n, which just expands to \n in > the table; and with an actual newline character, which fails with error since > it prematurely terminates the record. > It seems backslashes are only used to take the following character as a > literal > Until this is fixed, what would be the best way to refactor an old table with > a new, incompatible structure maintaining its content and name, since we > can't rename tables? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145722#comment-17145722 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874: --- [~brandon.williams] Thanks for the information. Before I upgrade the cluster to 3.11.6, I would like to understand if there are any known issues with 3.11.6? > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136793#comment-17136793 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-15874 at 6/16/20, 9:03 PM: - thanks [~brandon.williams] can you please provide the symptoms of this race conditions? in my case I see only some portion of the data is not bootstrapped but rest of the data bootstrapped without any issues. was (Author: jaid): thanks [~brandon.williams] can you please provide the symptoms of this race conditions? in my case I see only some portion of the data is bootstrapped but rest of the data bootstrapped without any issues. > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136793#comment-17136793 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874: --- thanks [~brandon.williams] can you please provide the symptoms of this race conditions? in my case I see only some portion of the data is bootstrapped but rest of the data bootstrapped without any issues. > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135298#comment-17135298 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15850: --- any update on this? is there some configuration that can help to reduce the delay? > Delay between Gossip settle and CQL port opening during the startup > --- > > Key: CASSANDRA-15850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello, > When I am bootstrapping/restarting a Cassandra Node, there is a delay between > gossip settle and CQL port opening. Can someone please explain me where this > delay is configured and can this be changed? I don't see any information in > the logs > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > {code:java} > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip > to settle... > INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; > proceeding > INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for > CQL clients on /x.x.x.x:9042 (encrypted)... > {code} > Also during this 3-10 minutes delay, I see > {noformat} > nodetool compactionstats > {noformat} > command is hung and never respond, until the CQL port is up and running. > Can someone please help me understand the delay here? > Cassandra Version: 3.11.3 > The issue can be easily reproducible with around 300 Tables and 100 nodes in > a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135289#comment-17135289 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874: --- Thanks [~brandon.williams], can you please provide some details under what scenario when this will happens? I am trying to re-produce the issue. > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
Jai Bheemsen Rao Dhanwada created CASSANDRA-15874: - Summary: Bootstrap completes Successfully without streaming all the data Key: CASSANDRA-15874 URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 Project: Cassandra Issue Type: Bug Components: Consistency/Bootstrap and Decommission Reporter: Jai Bheemsen Rao Dhanwada I am seeing a strange issue where, adding a new node with auto_bootstrap: true is not streaming all the data before it joins the cluster. Don't see any information in the logs about bootstrap failures. Here is the sequence of logs {code:java} INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: schema complete, ready to bootstrap INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: waiting for pending range calculation INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: calculation complete, ready to bootstrap INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: getting bootstrap token INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: Starting to bootstrap... org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId . If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation. INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 StorageService.java:1505 - Bootstrap completed! for the tokens {code} Cassandra Version: 3.11.3 I am not able to reproduce this issue all the time, but it happened couple of times. Is there any race condition/corner case, which could cause this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15850: -- Impacts: (was: None) > Delay between Gossip settle and CQL port opening during the startup > --- > > Key: CASSANDRA-15850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello, > When I am bootstrapping/restarting a Cassandra Node, there is a delay between > gossip settle and CQL port opening. Can someone please explain me where this > delay is configured and can this be changed? I don't see any information in > the logs > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > {code:java} > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip > to settle... > INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; > proceeding > INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for > CQL clients on /x.x.x.x:9042 (encrypted)... > {code} > Also during this 3-10 minutes delay, I see > {noformat} > nodetool compactionstats > {noformat} > command is hung and never respond, until the CQL port is up and running. > Can someone please help me understand the delay here? > Cassandra Version: 3.11.3 > The issue can be easily reproducible with around 300 Tables and 100 nodes in > a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
Jai Bheemsen Rao Dhanwada created CASSANDRA-15850: - Summary: Delay between Gossip settle and CQL port opening during the startup Key: CASSANDRA-15850 URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Hello, When I am bootstrapping/restarting a Cassandra Node, there is a delay between gossip settle and CQL port opening. Can someone please explain me where this delay is configured and can this be changed? I don't see any information in the logs In my case if you see there is a ~3 minutes delay and this increases if I increase the #of tables and #of nodes and DC. {code:java} INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to settle... INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for CQL clients on /x.x.x.x:9042 (encrypted)... {code} Also during this 3-10 minutes delay, I see {noformat} nodetool compactionstats {noformat} command is hung and never respond, until the CQL port is up and running. Can someone please help me understand the delay here? Cassandra Version: 3.11.3 The issue can be easily reproducible with around 300 Tables and 100 nodes in a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088090#comment-17088090 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15449: --- Any pointers here? Today, I saw an issue on a 3 node cluster where, I just started adding new nodes (bootstrap) and see the issue. In this case RF:3 Consistency for Read Queries: Local_QUORUM. As I mentioned initially I don't see any exceptions or errors in the Cassandra logs. > Credentials out of sync after replacing the nodes > - > > Key: CASSANDRA-15449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15449 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png > > > Hello, > We are seeing a strange issue where, after replacing multiple C* nodes from > the clusters intermittently we see an issue where few nodes doesn't have any > credentials and the client queries fail. > Here are the sequence of steps > 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes > in one DC. > 2. The approach we took to replace the nodes is kill one node and launch a > new node with {{-Dcassandra.replace_address=}} and proceed with next node > once the node is bootstrapped and CQL is enabled. > 3. This process works fine and all of a sudden, we started seeing our > application started failing with the below errors in the logs > {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc > has no SELECT permission on or any of its parents at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) > at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) > at > {quote} > 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while > rest of the nodes are serving ~100 requests. (attached the metrics) > 5. We suspect some credentials sync issue and manually synced the > credentials and restarted the nodes with 0 requests, which fixed the problem. > Also, one few C* nodes we see below exception immediately after the bootstrap > is completed and the process dies. is this contributing to the credentials > issue? > NOTE: The C* nodes with zero traffic and the nodes with the below exception > are not the same. > {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - > Exception encountered during startup > java.lang.AssertionError: > org.apache.cassandra.exceptions.InvalidRequestException: Undefined name > salted_hash in selection clause > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at org.apache.cassandra.auth.Auth.setup(Auth.java:144) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) > [apache-cassandra-2.1.16.jar:2.1.16] > Caused by: org.apache.cassandra.exceptions.InvalidRequestException: > Undefined name salted_hash in selection clause > at > org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) > ~[apache-cassandra-2.1.16.jar:2.1.16] > ... 7 common frames omitted > {quote} > Not sure why this is happening, is this a potential bug or any other pointers > to fix the problem. > C* Version: 2.1.16 > Client: Datastax Java Driver. > system_auth RF: 3, dc-1:3 and dc-2:3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15645) Can't send schema pull request: node /A.B.C.D is down
[ https://issues.apache.org/jira/browse/CASSANDRA-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088063#comment-17088063 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15645: --- I had similar issue and looks like this is introduced in 3.11 https://fossies.org/diffs/apache-cassandra/3.10-src_vs_3.11.0-src/src/java/org/apache/cassandra/service/MigrationTask.java-diff.html Does this cause any issue to the schema or data? I am using C* version 3.11.3. > Can't send schema pull request: node /A.B.C.D is down > - > > Key: CASSANDRA-15645 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15645 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Pierre Belanger apache.org >Priority: Normal > > On a new cluster with Cassandra 3.11.5, each time a node joins the cluster > the schema pull request happens before at least 1 node is confirmed up. On > the first node it's fine but node #2 and following are all complaining with > below WARN. > > {noformat} > INFO [MigrationStage:1] 2020-03-16 16:49:32,355 ColumnFamilyStore.java:426 - > Initializing system_auth.roles > WARN [MigrationStage:1] 2020-03-16 16:49:32,368 MigrationTask.java:67 - Can't > send schema pull request: node /A.B.C.D is down. > WARN [MigrationStage:1] 2020-03-16 16:49:32,369 MigrationTask.java:67 - Can't > send schema pull request: node /A.B.C.D is down. > INFO [main] 2020-03-16 16:49:32,371 Gossiper.java:1780 - Waiting for gossip > to settle... > INFO [GossipStage:1] 2020-03-16 16:49:32,493 Gossiper.java:1089 - InetAddress > /A.B.C.D is now UP > INFO [HANDSHAKE-/10.205.45.19] 2020-03-16 16:49:32,545 > OutboundTcpConnection.java:561 - Handshaking version with /A.B.C.D > {noformat} > > It's not urgent to fix but the WARN create noise for no reason. Before > trying to pull the schema, shouldn't the process wait for gossip to have at > least 1 node "up"? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-13649: -- Status: Open (was: Resolved) > Uncaught exceptions in Netty pipeline > - > > Key: CASSANDRA-13649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging, Legacy/Testing >Reporter: Stefan Podkowinski >Assignee: Norman Maurer >Priority: Normal > Labels: patch > Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0 > > Attachments: > 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, > test_stdout.txt > > > I've noticed some netty related errors in trunk in [some of the dtest > results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. > Just want to make sure that we don't have to change anything related to the > exception handling in our pipeline and that this isn't a netty issue. > Actually if this causes flakiness but is otherwise harmless, we should do > something about it, even if it's just on the dtest side. > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > And again in another test: > {noformat} > WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > Edit: > The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() > failed}} error also causes tests to fail for 3.0 and 3.11. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035671#comment-17035671 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13649: --- Hello, I am using 3.11.3 version of Cassandra and I still see the exceptions are happening. Can some please take a look? also are these errors harmful? I don't see any errors on my application. just want to make sure I am not ignoring the potential issues. Also looking at the exception it's not specific to Cassandra version, rather specific to netty version? {code:java} INFO [epollEventLoopGroup-2-25] 2020-02-12 19:46:13,867 Message.java:623 - Unexpected exception during request; channel = [id: 0x4cea3872, L:/10.130.8.31:9042 - R:/10.131.85.41:47374] io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] {code} > Uncaught exceptions in Netty pipeline > - > > Key: CASSANDRA-13649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging, Legacy/Testing >Reporter: Stefan Podkowinski >Assignee: Norman Maurer >Priority: Normal > Labels: patch > Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0 > > Attachments: > 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, > test_stdout.txt > > > I've noticed some netty related errors in trunk in [some of the dtest > results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. > Just want to make sure that we don't have to change anything related to the > exception handling in our pipeline and that this isn't a netty issue. > Actually if this causes flakiness but is otherwise harmless, we should do > something about it, even if it's just on the dtest side. > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > And again in another test: > {noformat} > WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > Edit: > The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() > failed}} error also causes tests to fail for 3.0 and 3.11. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998605#comment-16998605 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15449: --- [~dcapwell] Thanks for the response. For Manual sync I ran select with CONSISTENCY ALL, so it does a read repair. I don't see any errors in the Cassandra system.log except the salted_hash exception I pasted above. > Credentials out of sync after replacing the nodes > - > > Key: CASSANDRA-15449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15449 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png > > > Hello, > We are seeing a strange issue where, after replacing multiple C* nodes from > the clusters intermittently we see an issue where few nodes doesn't have any > credentials and the client queries fail. > Here are the sequence of steps > 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes > in one DC. > 2. The approach we took to replace the nodes is kill one node and launch a > new node with {{-Dcassandra.replace_address=}} and proceed with next node > once the node is bootstrapped and CQL is enabled. > 3. This process works fine and all of a sudden, we started seeing our > application started failing with the below errors in the logs > {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc > has no SELECT permission on or any of its parents at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) > at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) > at > {quote} > 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while > rest of the nodes are serving ~100 requests. (attached the metrics) > 5. We suspect some credentials sync issue and manually synced the > credentials and restarted the nodes with 0 requests, which fixed the problem. > Also, one few C* nodes we see below exception immediately after the bootstrap > is completed and the process dies. is this contributing to the credentials > issue? > NOTE: The C* nodes with zero traffic and the nodes with the below exception > are not the same. > {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - > Exception encountered during startup > java.lang.AssertionError: > org.apache.cassandra.exceptions.InvalidRequestException: Undefined name > salted_hash in selection clause > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at org.apache.cassandra.auth.Auth.setup(Auth.java:144) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) > [apache-cassandra-2.1.16.jar:2.1.16] > Caused by: org.apache.cassandra.exceptions.InvalidRequestException: > Undefined name salted_hash in selection clause > at > org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) > ~[apache-cassandra-2.1.16.jar:2.1.16] > ... 7 common frames omitted > {quote} > Not sure why this is happening, is this a potential bug or any other pointers > to fix the problem. > C* Version: 2.1.16 > Client: Datastax Java Driver. > system_auth RF: 3, dc-1:3 and dc-2:3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449: -- Impacts: Clients (was: None) > Credentials out of sync after replacing the nodes > - > > Key: CASSANDRA-15449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15449 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png > > > Hello, > We are seeing a strange issue where, after replacing multiple C* nodes from > the clusters intermittently we see an issue where few nodes doesn't have any > credentials and the client queries fail. > Here are the sequence of steps > 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes > in one DC. > 2. The approach we took to replace the nodes is kill one node and launch a > new node with {{-Dcassandra.replace_address=}} and proceed with next node > once the node is bootstrapped and CQL is enabled. > 3. This process works fine and all of a sudden, we started seeing our > application started failing with the below errors in the logs > {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc > has no SELECT permission on or any of its parents at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) > at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) > at > {quote} > 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while > rest of the nodes are serving ~100 requests. (attached the metrics) > 5. We suspect some credentials sync issue and manually synced the > credentials and restarted the nodes with 0 requests, which fixed the problem. > Also, one few C* nodes we see below exception immediately after the bootstrap > is completed and the process dies. is this contributing to the credentials > issue? > NOTE: The C* nodes with zero traffic and the nodes with the below exception > are not the same. > {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - > Exception encountered during startup > java.lang.AssertionError: > org.apache.cassandra.exceptions.InvalidRequestException: Undefined name > salted_hash in selection clause > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at org.apache.cassandra.auth.Auth.setup(Auth.java:144) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) > [apache-cassandra-2.1.16.jar:2.1.16] > Caused by: org.apache.cassandra.exceptions.InvalidRequestException: > Undefined name salted_hash in selection clause > at > org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) > ~[apache-cassandra-2.1.16.jar:2.1.16] > ... 7 common frames omitted > {quote} > Not sure why this is happening, is this a potential bug or any other pointers > to fix the problem. > C* Version: 2.1.16 > Client: Datastax Java Driver. > system_auth RF: 3, dc-1:3 and dc-2:3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449: -- Description: Hello, We are seeing a strange issue where, after replacing multiple C* nodes from the clusters intermittently we see an issue where few nodes doesn't have any credentials and the client queries fail. Here are the sequence of steps 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in one DC. 2. The approach we took to replace the nodes is kill one node and launch a new node with {{-Dcassandra.replace_address=}} and proceed with next node once the node is bootstrapped and CQL is enabled. 3. This process works fine and all of a sudden, we started seeing our application started failing with the below errors in the logs {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has no SELECT permission on or any of its parents at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) at {quote} 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while rest of the nodes are serving ~100 requests. (attached the metrics) 5. We suspect some credentials sync issue and manually synced the credentials and restarted the nodes with 0 requests, which fixed the problem. Also, one few C* nodes we see below exception immediately after the bootstrap is completed and the process dies. is this contributing to the credentials issue? NOTE: The C* nodes with zero traffic and the nodes with the below exception are not the same. {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - Exception encountered during startup java.lang.AssertionError: org.apache.cassandra.exceptions.InvalidRequestException: Undefined name salted_hash in selection clause at org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.auth.Auth.setup(Auth.java:144) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) [apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) [apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [apache-cassandra-2.1.16.jar:2.1.16] Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined name salted_hash in selection clause at org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) ~[apache-cassandra-2.1.16.jar:2.1.16] ... 7 common frames omitted {quote} Not sure why this is happening, is this a potential bug or any other pointers to fix the problem. C* Version: 2.1.16 Client: Datastax Java Driver. system_auth RF: 3, dc-1:3 and dc-2:3 was: Hello, We are seeing a strange issue where, after replacing multiple C* nodes from the clusters intermittently we see an issue where few nodes doesn't have any credentials and the client queries fail. Here are the sequence of steps 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in one DC. 2. The approach we took to replace the nodes is kill one node and launch a new node with {{-Dcassandra.replace_address=}} and proceed with next node once the node is bootstrapped and CQL is enabled. 3. This process works fine and all of a sudden, we started seeing our application started failing with the below errors in the logs {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has no SELECT permission on or any of its parents at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) at {quote} 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while rest of the nodes are serving ~100 requests. (attached the metrics) !Screen Shot 2019-12-12 at 11.13.52 AM.png! 5. We suspect some credentials sync issue and man
[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449: -- Attachment: Screen Shot 2019-12-12 at 11.13.52 AM.png > Credentials out of sync after replacing the nodes > - > > Key: CASSANDRA-15449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15449 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png > > > Hello, > We are seeing a strange issue where, after replacing multiple C* nodes from > the clusters intermittently we see an issue where few nodes doesn't have any > credentials and the client queries fail. > Here are the sequence of steps > 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes > in one DC. > 2. The approach we took to replace the nodes is kill one node and launch a > new node with {{-Dcassandra.replace_address=}} and proceed with next node > once the node is bootstrapped and CQL is enabled. > 3. This process works fine and all of a sudden, we started seeing our > application started failing with the below errors in the logs > {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc > has no SELECT permission on or any of its parents at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) > at > com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) > at > {quote} > 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while > rest of the nodes are serving ~100 requests. (attached the metrics) > !Screen Shot 2019-12-12 at 11.13.52 AM.png! > 5. We suspect some credentials sync issue and manually synced the > credentials and restarted the nodes with 0 requests, which fixed the problem. > Also, one few C* nodes we see below exception immediately after the bootstrap > is completed and the process dies. is this contributing to the credentials > issue? > NOTE: The C* nodes with zero traffic and the nodes with the below exception > are not the same. > {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - > Exception encountered during startup > java.lang.AssertionError: > org.apache.cassandra.exceptions.InvalidRequestException: Undefined name > salted_hash in selection clause > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at org.apache.cassandra.auth.Auth.setup(Auth.java:144) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) > [apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) > [apache-cassandra-2.1.16.jar:2.1.16] > Caused by: org.apache.cassandra.exceptions.InvalidRequestException: > Undefined name salted_hash in selection clause > at > org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at > org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) > ~[apache-cassandra-2.1.16.jar:2.1.16] > ... 7 common frames omitted > {quote} > Not sure why this is happening, is this a potential bug or any other pointers > to fix the problem. > C* Version: 2.1.16 > Client: Datastax Java Driver. > system_auth RF: 3, dc-1:3 and dc-2:3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449: -- Description: Hello, We are seeing a strange issue where, after replacing multiple C* nodes from the clusters intermittently we see an issue where few nodes doesn't have any credentials and the client queries fail. Here are the sequence of steps 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in one DC. 2. The approach we took to replace the nodes is kill one node and launch a new node with {{-Dcassandra.replace_address=}} and proceed with next node once the node is bootstrapped and CQL is enabled. 3. This process works fine and all of a sudden, we started seeing our application started failing with the below errors in the logs {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has no SELECT permission on or any of its parents at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) at {quote} 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while rest of the nodes are serving ~100 requests. (attached the metrics) !Screen Shot 2019-12-12 at 11.13.52 AM.png! 5. We suspect some credentials sync issue and manually synced the credentials and restarted the nodes with 0 requests, which fixed the problem. Also, one few C* nodes we see below exception immediately after the bootstrap is completed and the process dies. is this contributing to the credentials issue? NOTE: The C* nodes with zero traffic and the nodes with the below exception are not the same. {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - Exception encountered during startup java.lang.AssertionError: org.apache.cassandra.exceptions.InvalidRequestException: Undefined name salted_hash in selection clause at org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.auth.Auth.setup(Auth.java:144) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) [apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) [apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [apache-cassandra-2.1.16.jar:2.1.16] Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined name salted_hash in selection clause at org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) ~[apache-cassandra-2.1.16.jar:2.1.16] ... 7 common frames omitted {quote} Not sure why this is happening, is this a potential bug or any other pointers to fix the problem. C* Version: 2.1.16 Client: Datastax Java Driver. system_auth RF: 3, dc-1:3 and dc-2:3 was: Hello, We are seeing a strange issue where, after replacing multiple C* nodes from the clusters intermittently we see an issue where few nodes doesn't have any credentials and the client queries fail. Here are the sequence of steps 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in one DC. 2. The approach we took to replace the nodes is kill one node and launch a new node with {{-Dcassandra.replace_address=}} and proceed with next node once the node is bootstrapped and CQL is enabled. 3. This process works fine and all of a sudden, we started seeing our application started failing with the below errors in the logs {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has no SELECT permission on or any of its parents at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) at {quote} 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while rest of the nodes are serving ~100 requests. (attached the metrics) !Screen Shot 2019-12-12 at 11.13.52 AM.png! 5. We
[jira] [Created] (CASSANDRA-15449) Credentials out of sync after replacing the nodes
Jai Bheemsen Rao Dhanwada created CASSANDRA-15449: - Summary: Credentials out of sync after replacing the nodes Key: CASSANDRA-15449 URL: https://issues.apache.org/jira/browse/CASSANDRA-15449 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png Hello, We are seeing a strange issue where, after replacing multiple C* nodes from the clusters intermittently we see an issue where few nodes doesn't have any credentials and the client queries fail. Here are the sequence of steps 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in one DC. 2. The approach we took to replace the nodes is kill one node and launch a new node with {{-Dcassandra.replace_address=}} and proceed with next node once the node is bootstrapped and CQL is enabled. 3. This process works fine and all of a sudden, we started seeing our application started failing with the below errors in the logs {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has no SELECT permission on or any of its parents at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59) at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25) at {quote} 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while rest of the nodes are serving ~100 requests. (attached the metrics) !Screen Shot 2019-12-12 at 11.13.52 AM.png! 5. We suspect some credentials sync issue and manually synced the credentials and restarted the nodes with 0 requests, which fixed the problem. Also, one few C* nodes we see below exception immediately after the bootstrap is completed and the process dies. is this contributing to the credentials issue? NOTE: The C* nodes with zero traffic and the nodes with the below exception are not the same. {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - Exception encountered during startup java.lang.AssertionError: org.apache.cassandra.exceptions.InvalidRequestException: Undefined name salted_hash in selection clause at org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.auth.Auth.setup(Auth.java:144) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) [apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) [apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [apache-cassandra-2.1.16.jar:2.1.16] Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined name salted_hash in selection clause at org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592) ~[apache-cassandra-2.1.16.jar:2.1.16] at org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198) ~[apache-cassandra-2.1.16.jar:2.1.16] ... 7 common frames omitted {quote} Not sure why this is happening, is this a potential bug or any other pointers to fix the problem. C* Version: 2.1.16 Client: Datastax Java Driver. system_auth RF: 3, dc-1:3 and dc-2:3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781997#comment-16781997 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038: --- [~slebresne] Thank you, yes I agree with the security concerns, we can add warnings and enable this, so that the truststore check can be disabled. It would be great if this can be implemented. > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Major > > Hello, > The current internode encryption between cassandra nodes uses a keystore and > truststore. However there are some use-case where users are okay to allow any > one to trust as long as they have a keystore. This is requirement is only for > encryption but not trusting the identity. > It would be good to have an option to disable the Truststore CA check for the > internode_encryption. > > In the current cassandra.yaml, there is no way to comment/disable the > truststore and truststore password and allow anyone to connect with a > certificate. > > though the require_client_auth: is set to false, cassandra fails to startup > if we disable truststore and truststore_password as it look for default > truststore under `conf/.truststore` > > {code:java} > server_encryption_options: > internode_encryption: all > keystore: /etc/cassandra/keystore.jks > keystore_password: mykeypass > truststore: /etc/cassandra/truststore.jks > truststore_password: truststorepass > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false{code} > {noformat} > Caused by: java.io.IOException: Error creating the initializing the SSL > Context > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 8 common frames omitted > Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) > at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 10 common frames omitted{noformat} > > Cassandra Version: 3.11.3 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780708#comment-16780708 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038: --- correct, ignore the truststore check > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Major > > Hello, > The current internode encryption between cassandra nodes uses a keystore and > truststore. However there are some use-case where users are okay to allow any > one to trust as long as they have a keystore. This is requirement is only for > encryption but not trusting the identity. > It would be good to have an option to disable the Truststore CA check for the > internode_encryption. > > In the current cassandra.yaml, there is no way to comment/disable the > truststore and truststore password and allow anyone to connect with a > certificate. > > though the require_client_auth: is set to false, cassandra fails to startup > if we disable truststore and truststore_password as it look for default > truststore under `conf/.truststore` > > {code:java} > server_encryption_options: > internode_encryption: all > keystore: /etc/cassandra/keystore.jks > keystore_password: mykeypass > truststore: /etc/cassandra/truststore.jks > truststore_password: truststorepass > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false{code} > {noformat} > Caused by: java.io.IOException: Error creating the initializing the SSL > Context > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 8 common frames omitted > Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) > at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 10 common frames omitted{noformat} > > Cassandra Version: 3.11.3 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780223#comment-16780223 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038: --- In a way yes, but consider another use-case where I trying to setup SSL to encrypt the messages in flight but I trust the members who try to join the cluster. Agree, there are several ways to do it, but ask was why not make use of cassandra configuration to do it when it's already present. (in this case it's not working as expected) > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Major > > Hello, > The current internode encryption between cassandra nodes uses a keystore and > truststore. However there are some use-case where users are okay to allow any > one to trust as long as they have a keystore. This is requirement is only for > encryption but not trusting the identity. > It would be good to have an option to disable the Truststore CA check for the > internode_encryption. > > In the current cassandra.yaml, there is no way to comment/disable the > truststore and truststore password and allow anyone to connect with a > certificate. > > though the require_client_auth: is set to false, cassandra fails to startup > if we disable truststore and truststore_password as it look for default > truststore under `conf/.truststore` > > {code:java} > server_encryption_options: > internode_encryption: all > keystore: /etc/cassandra/keystore.jks > keystore_password: mykeypass > truststore: /etc/cassandra/truststore.jks > truststore_password: truststorepass > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false{code} > {noformat} > Caused by: java.io.IOException: Error creating the initializing the SSL > Context > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 8 common frames omitted > Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) > at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 10 common frames omitted{noformat} > > Cassandra Version: 3.11.3 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780218#comment-16780218 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-15038 at 2/28/19 7:51 AM: Yes, the basic idea if every cassandra node will have a self signed cert, since the CA is different for each node they don't join the cluster as the other members don't know about the CA. So, there should be a way to disable a CA check. The current require_client_auth: false doesn't seem to be working. I tried to uncomment the property and set to false, even that didn't make much difference. was (Author: jaid): Yes, the basic idea if every cassandra node will have a self signed cert, since the CA is different for each node they don't join the cluster as the other members don't know about the CA. So, there should be a way to disable a CA check. The current require_client_auth: false doesn't seem to be working. > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Major > > Hello, > The current internode encryption between cassandra nodes uses a keystore and > truststore. However there are some use-case where users are okay to allow any > one to trust as long as they have a keystore. This is requirement is only for > encryption but not trusting the identity. > It would be good to have an option to disable the Truststore CA check for the > internode_encryption. > > In the current cassandra.yaml, there is no way to comment/disable the > truststore and truststore password and allow anyone to connect with a > certificate. > > though the require_client_auth: is set to false, cassandra fails to startup > if we disable truststore and truststore_password as it look for default > truststore under `conf/.truststore` > > {code:java} > server_encryption_options: > internode_encryption: all > keystore: /etc/cassandra/keystore.jks > keystore_password: mykeypass > truststore: /etc/cassandra/truststore.jks > truststore_password: truststorepass > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false{code} > {noformat} > Caused by: java.io.IOException: Error creating the initializing the SSL > Context > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 8 common frames omitted > Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) > at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 10 common frames omitted{noformat} > > Cassandra Version: 3.11.3 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780218#comment-16780218 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15038: --- Yes, the basic idea if every cassandra node will have a self signed cert, since the CA is different for each node they don't join the cluster as the other members don't know about the CA. So, there should be a way to disable a CA check. The current require_client_auth: false doesn't seem to be working. > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Major > > Hello, > The current internode encryption between cassandra nodes uses a keystore and > truststore. However there are some use-case where users are okay to allow any > one to trust as long as they have a keystore. This is requirement is only for > encryption but not trusting the identity. > It would be good to have an option to disable the Truststore CA check for the > internode_encryption. > > In the current cassandra.yaml, there is no way to comment/disable the > truststore and truststore password and allow anyone to connect with a > certificate. > > though the require_client_auth: is set to false, cassandra fails to startup > if we disable truststore and truststore_password as it look for default > truststore under `conf/.truststore` > > {code:java} > server_encryption_options: > internode_encryption: all > keystore: /etc/cassandra/keystore.jks > keystore_password: mykeypass > truststore: /etc/cassandra/truststore.jks > truststore_password: truststorepass > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false{code} > {noformat} > Caused by: java.io.IOException: Error creating the initializing the SSL > Context > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 8 common frames omitted > Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) > at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 10 common frames omitted{noformat} > > Cassandra Version: 3.11.3 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15038: -- Description: Hello, The current internode encryption between cassandra nodes uses a keystore and truststore. However there are some use-case where users are okay to allow any one to trust as long as they have a keystore. This is requirement is only for encryption but not trusting the identity. It would be good to have an option to disable the Truststore CA check for the internode_encryption. In the current cassandra.yaml, there is no way to comment/disable the truststore and truststore password and allow anyone to connect with a certificate. `conf/.truststore` {code:java} server_encryption_options: internode_encryption: all keystore: /etc/cassandra/keystore.jks keystore_password: mykeypass truststore: /etc/cassandra/truststore.jks truststore_password: truststorepass # More advanced defaults below: # protocol: TLS # algorithm: SunX509 # store_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] # require_client_auth: false # require_endpoint_verification: false{code} {noformat} Caused by: java.io.IOException: Error creating the initializing the SSL Context at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 8 common frames omitted Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 10 common frames omitted{noformat} Cassandra Version: 3.11.3 was: Hello, The current internode encryption between cassandra nodes uses a keystore and truststore. However there are some use-case where users are okay to allow any one to trust as long as they have a keystore. This is requirement is only for encryption but not trusting the identity. It would be good to have an option to disable the Truststore CA check for the internode_encryption. In the current cassandra.yaml, there is no way to comment/disable the truststore and truststore password and allow anyone to connect with a certificate. `conf/.truststore` {code:java} server_encryption_options: internode_encryption: all keystore: /etc/cassandra/keystore.jks keystore_password: mykeypass truststore: /etc/cassandra/truststore.jks truststore_password: truststorepass # More advanced defaults below: # protocol: TLS # algorithm: SunX509 # store_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] # require_client_auth: false # require_endpoint_verification: false{code} {noformat} Caused by: java.io.IOException: Error creating the initializing the SSL Context at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 8 common frames omitted Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 10 common frames omitted{noformat} > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type:
[jira] [Updated] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15038: -- Description: Hello, The current internode encryption between cassandra nodes uses a keystore and truststore. However there are some use-case where users are okay to allow any one to trust as long as they have a keystore. This is requirement is only for encryption but not trusting the identity. It would be good to have an option to disable the Truststore CA check for the internode_encryption. In the current cassandra.yaml, there is no way to comment/disable the truststore and truststore password and allow anyone to connect with a certificate. though the require_client_auth: is set to false, cassandra fails to startup if we disable truststore and truststore_password as it look for default truststore under `conf/.truststore` {code:java} server_encryption_options: internode_encryption: all keystore: /etc/cassandra/keystore.jks keystore_password: mykeypass truststore: /etc/cassandra/truststore.jks truststore_password: truststorepass # More advanced defaults below: # protocol: TLS # algorithm: SunX509 # store_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] # require_client_auth: false # require_endpoint_verification: false{code} {noformat} Caused by: java.io.IOException: Error creating the initializing the SSL Context at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 8 common frames omitted Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 10 common frames omitted{noformat} Cassandra Version: 3.11.3 was: Hello, The current internode encryption between cassandra nodes uses a keystore and truststore. However there are some use-case where users are okay to allow any one to trust as long as they have a keystore. This is requirement is only for encryption but not trusting the identity. It would be good to have an option to disable the Truststore CA check for the internode_encryption. In the current cassandra.yaml, there is no way to comment/disable the truststore and truststore password and allow anyone to connect with a certificate. `conf/.truststore` {code:java} server_encryption_options: internode_encryption: all keystore: /etc/cassandra/keystore.jks keystore_password: mykeypass truststore: /etc/cassandra/truststore.jks truststore_password: truststorepass # More advanced defaults below: # protocol: TLS # algorithm: SunX509 # store_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] # require_client_auth: false # require_endpoint_verification: false{code} {noformat} Caused by: java.io.IOException: Error creating the initializing the SSL Context at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 8 common frames omitted Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 10 common frames omitted{noformat} Cassandra Version: 3.11.3 > Provide an option to Disable Truststore CA check for internode_encryption > -
[jira] [Updated] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-15038: -- Issue Type: Bug (was: Improvement) > Provide an option to Disable Truststore CA check for internode_encryption > - > > Key: CASSANDRA-15038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Major > > Hello, > The current internode encryption between cassandra nodes uses a keystore and > truststore. However there are some use-case where users are okay to allow any > one to trust as long as they have a keystore. This is requirement is only for > encryption but not trusting the identity. > It would be good to have an option to disable the Truststore CA check for the > internode_encryption. > > In the current cassandra.yaml, there is no way to comment/disable the > truststore and truststore password and allow anyone to connect with a > certificate. `conf/.truststore` > > {code:java} > server_encryption_options: > internode_encryption: all > keystore: /etc/cassandra/keystore.jks > keystore_password: mykeypass > truststore: /etc/cassandra/truststore.jks > truststore_password: truststorepass > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false{code} > {noformat} > Caused by: java.io.IOException: Error creating the initializing the SSL > Context > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) > ~[apache-cassandra-3.11.3.jar:3.11.3] > at > org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 8 common frames omitted > Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) > at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] > at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] > at > org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) > ~[apache-cassandra-3.11.3.jar:3.11.3] > ... 10 common frames omitted{noformat} > > Cassandra Version: 3.11.3 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption
Jai Bheemsen Rao Dhanwada created CASSANDRA-15038: - Summary: Provide an option to Disable Truststore CA check for internode_encryption Key: CASSANDRA-15038 URL: https://issues.apache.org/jira/browse/CASSANDRA-15038 Project: Cassandra Issue Type: Improvement Components: Feature/Encryption Reporter: Jai Bheemsen Rao Dhanwada Hello, The current internode encryption between cassandra nodes uses a keystore and truststore. However there are some use-case where users are okay to allow any one to trust as long as they have a keystore. This is requirement is only for encryption but not trusting the identity. It would be good to have an option to disable the Truststore CA check for the internode_encryption. In the current cassandra.yaml, there is no way to comment/disable the truststore and truststore password and allow anyone to connect with a certificate. `conf/.truststore` {code:java} server_encryption_options: internode_encryption: all keystore: /etc/cassandra/keystore.jks keystore_password: mykeypass truststore: /etc/cassandra/truststore.jks truststore_password: truststorepass # More advanced defaults below: # protocol: TLS # algorithm: SunX509 # store_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] # require_client_auth: false # require_endpoint_verification: false{code} {noformat} Caused by: java.io.IOException: Error creating the initializing the SSL Context at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 8 common frames omitted Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied) at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151] at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151] at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151] at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168) ~[apache-cassandra-3.11.3.jar:3.11.3] ... 10 common frames omitted{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665755#comment-16665755 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-11748: --- Any proposed fix for this available yet? or any work around? > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Reopened] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada reopened CASSANDRA-14840: --- [~jjirsa] is this fixed any of the new versions of Cassandra? or are there any workarounds to overcome the issue? given that # I am already using off heap buffers # Adding iptables for every node addition in a production environment is not an option for me. > Bootstrap of new node fails with OOM in a large cluster > --- > > Key: CASSANDRA-14840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > We are seeing new node addition fails with OOM during bootstrap in a cluster > of more than 80 nodes and 3000 CF without any data in those CFs. > > Steps to reproduce: > # Launch a 3 node cluster > # Create 3000 CF in the cluster > # Start adding nodes to the cluster one by one > # After adding 75-80 nodes, the new node bootstrap fails with OOM. > {code:java} > ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 > JVMStabilityInspector.java:78 - Exiting due to error while processing commit > log during initialization. > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] > at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] > at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] > at > org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} > Cassandra Version: 2.1.16 > OS: CentOS7 > num_tokens: 256 on each node. > > This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663076#comment-16663076 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14840: --- [~jjirsa] This is the production cluster and all the CF are being used, so I can't delete any of the CF. # I am already using Offheap memtables, still the getting OOM. Current Heap settings are `-Xms8192M -Xmx8192M -Xmn1200M` and CMS Heap. I tried increasing the heap size to 16G and after adding 120 nodes I still OOM issues to the new node bootstrapping. any other suggestions here? # Sounds like very handful approach, not sure if I can time it very well. > Bootstrap of new node fails with OOM in a large cluster > --- > > Key: CASSANDRA-14840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > We are seeing new node addition fails with OOM during bootstrap in a cluster > of more than 80 nodes and 3000 CF without any data in those CFs. > > Steps to reproduce: > # Launch a 3 node cluster > # Create 3000 CF in the cluster > # Start adding nodes to the cluster one by one > # After adding 75-80 nodes, the new node bootstrap fails with OOM. > {code:java} > ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 > JVMStabilityInspector.java:78 - Exiting due to error while processing commit > log during initialization. > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] > at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] > at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] > at > org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} > Cassandra Version: 2.1.16 > OS: CentOS7 > num_tokens: 256 on each node. > > This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster
Jai Bheemsen Rao Dhanwada created CASSANDRA-14840: - Summary: Bootstrap of new node fails with OOM in a large cluster Key: CASSANDRA-14840 URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Reporter: Jai Bheemsen Rao Dhanwada We are seeing new node addition fails with OOM during bootstrap in a cluster of more than 80 nodes and 3000 CF without any data in those CFs. Steps to reproduce: # Launch a 3 node cluster # Create 3000 CF in the cluster # Start adding nodes to the cluster one by one # After adding 75-80 nodes, the new node bootstrap fails with OOM. {code:java} ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 JVMStabilityInspector.java:78 - Exiting due to error while processing commit log during initialization. java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] at org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) ~[apache-cassandra-2.1.16.jar:2.1.16] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} Cassandra Version: 2.1.16 OS: CentOS7 num_tokens: 256 on each node. This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13538) Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max "
[ https://issues.apache.org/jira/browse/CASSANDRA-13538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501322#comment-16501322 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13538: --- Noticed the similar issue in one of the environments, has anyone have any workaround? > Cassandra tasks permanently block after the following assertion occurs during > compaction: "java.lang.AssertionError: Interval min > max " > - > > Key: CASSANDRA-13538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13538 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: This happens on a 7 node system with 2 data centers. > We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still > occurs. >Reporter: Andy Klages >Priority: Major > Fix For: 2.1.x > > Attachments: cassandra.yaml, jstack.out, schema.cql3, system.log, > tpstats.out > > > We noticed this problem because the commitlogs proliferate to the point that > we eventually run out of disk space. nodetool tpstats shows several of the > tasks backed up: > {code} > Pool NameActive Pending Completed Blocked All > time blocked > MutationStage 0 0 134335315 0 > 0 > ReadStage 0 0 643986790 0 > 0 > RequestResponseStage 0 0 114298 0 > 0 > ReadRepairStage 0 0 36 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MiscStage 0 0 0 0 > 0 > AntiEntropySessions 1 1 79357 0 > 0 > HintedHandoff 0 0 90 0 > 0 > GossipStage 0 06595098 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > InternalResponseStage 0 01638369 0 > 0 > CommitLogArchiver 0 0 0 0 > 0 > CompactionExecutor2 1752922542 0 > 0 > ValidationExecutor0 01465374 0 > 0 > MigrationStage176600 0 > 0 > AntiEntropyStage 1 9238291098 0 > 0 > PendingRangeCalculator0 0 20 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableFlushWriter 0 0 53017 0 > 0 > MemtablePostFlush 1 45841545141 0 > 0 > MemtableReclaimMemory 0 0 70639 0 > 0 > Native-Transport-Requests 0 0 352559 0 > 0 > {code} > This all starts after the following exception is raised in Cassandra: > {code} > ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 > CassandraDaemon.java:231 - Exception in thread > Thread[MemtableFlushWriter:2437,5,main] > java.lang.AssertionError: Interval min > max > at > org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:249) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.utils.IntervalTree.(IntervalTree.java:72) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:603) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:597) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.ColumnFamilyStore.r
[jira] [Commented] (CASSANDRA-13235) All thread blocked and writes pending.
[ https://issues.apache.org/jira/browse/CASSANDRA-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133721#comment-16133721 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13235: --- We are also seeing similar issue and we see the huge write latency around 10s {code:java} "SharedPool-Worker-11" #355 daemon prio=5 os_prio=0 tid=0x7f9458e33800 nid=0x9e50 waiting for monitor entry [0x7f943cbf2000] java.lang.Thread.State: BLOCKED (on object monitor) at sun.misc.Unsafe.monitorEnter(Native Method) at org.apache.cassandra.utils.concurrent.Locks.monitorEnterUnsafe(Locks.java:46) at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:202) at org.apache.cassandra.db.Memtable.put(Memtable.java:210) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1263) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) at org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} > All thread blocked and writes pending. > -- > > Key: CASSANDRA-13235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13235 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: jdk8 > cassandra 2.1.15 >Reporter: zhaoyan > > I found cassandra many pending MutationStage task > {code} > NFO [Service Thread] 2017-02-17 16:00:14,440 StatusLogger.java:51 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [Service Thread] 2017-02-17 16:00:14,440 StatusLogger.java:66 - > MutationStage 384 4553 4294213082 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > RequestResponseStage 0 0 2172612382 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > ReadRepairStage 0 05378852 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > CounterMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > ReadStage 5 0 577242284 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > MiscStage 0 0 0 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > HintedHandoff 0 0 1480 0 > 0 > INFO [Service Thread] 2017-02-17 16:00:14,441 StatusLogger.java:66 - > GossipStage 0 09342250 0 > 0 > {code} > And I found there are many blocked thread with jstack > {code} > "SharedPool-Worker-28" #416 daemon prio=5 os_prio=0 tid=0x01fb8000 > nid=0x7459 waiting for monitor entry [0x7fdd83ca] >java.lang.Thread.State: BLOCKED (on object monitor) > at sun.misc.Unsafe.monitorEnter(Native Method) > at > org.apache.cassandra.utils.concurrent.Locks.monitorEnterUnsafe(Locks.java:46) > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:202) > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1244) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) > at > org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.ja
[jira] [Commented] (CASSANDRA-13526) nodetool cleanup on KS with no replicas should remove old data, not silently complete
[ https://issues.apache.org/jira/browse/CASSANDRA-13526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007319#comment-16007319 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13526: --- The issue I am seeing on C* cluster with the below setup Cassandra version : 2.1.16 Datacenters: 4 DC RF: NetworkTopologyStrategy with 3 RF in each DC Keyspaces: 50 keyspaces, few replicating to one DC and few replicating to multiple DC > nodetool cleanup on KS with no replicas should remove old data, not silently > complete > - > > Key: CASSANDRA-13526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13526 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Jeff Jirsa > > From the user list: > https://lists.apache.org/thread.html/5d49cc6bbc6fd2e5f8b12f2308a3e24212a55afbb441af5cb8cd4167@%3Cuser.cassandra.apache.org%3E > If you have a multi-dc cluster, but some keyspaces not replicated to a given > DC, you'll be unable to run cleanup on those keyspaces in that DC, because > [the cleanup code will see no ranges and exit > early|https://github.com/apache/cassandra/blob/4cfaf85/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L427-L441] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Reopened] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada reopened CASSANDRA-12816: --- > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658407#comment-15658407 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816: --- agree, but this limits someone to have all the keyspaces expanded to all the regions. if I have a use-case of having a keyspaces belonging different regions, I can't make use of it. > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada reopened CASSANDRA-12816: --- Reproduced In: 2.1.x Since Version: 2.1.16 [~jjordan] I have encountered this issue again and now it is complaining about the non-system keyspace. {code:java} [jaibheemsen@node01 ~]$ nodetool rebuild us-east nodetool: Unable to find sufficient sources for streaming range (1773952483933901933,1774688434180951054] in keyspace user_prod See 'nodetool help' or 'nodetool help '. [jaibheemsen@node01 ~]$ {code} C* version : 2.1.16 user_prod: keyspace is present in us-west-2 but not in us-east. I am doing a nodetool rebuild in us-west-2 to stream some data from the us-east > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605806#comment-15605806 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816: --- [~jjordan] Agree, the work around is to add -Dcassandra.consistent.rangemovement=false, but can you please help me understand, is this a bug or expected behavior? if this is the expected behavior, what is the action needed when expanding the cluster to new DC? do I need to always use NetworkTopologyStrategy for all non-LocalStrategy keyspaces like (system_distributed, systems_traces)? > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603565#comment-15603565 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816: --- This isn't correct. For example I have 10 Keyspaces in a cluster and I want to have 5 Keyspaces in DC-1 and other 5 keyspaces in DC-1 and DC-2, in this is the rebuild fails and I can't replicate the data to the newly added DC. > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603306#comment-15603306 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816: --- system_traces and system_distributed are using SimpleStrategy, If I change it to the NetworkTopologyStrategy, the rebuild operation works. any idea what are the implications of changing them to NetworkTopologyStrategy? > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603265#comment-15603265 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-12816: --- [~jeromatron] correct, but I am concerned about altering system level keyspaces (system, systems_traces, system_distributed). Do you know what is the impact (if any) of making system keyspaces to NetworkTopologyStrategy ? > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12816) Rebuild failing while adding new datacenter
[ https://issues.apache.org/jira/browse/CASSANDRA-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Bheemsen Rao Dhanwada updated CASSANDRA-12816: -- Priority: Critical (was: Major) > Rebuild failing while adding new datacenter > --- > > Key: CASSANDRA-12816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > Hello All, > I have single datacenter with 3 C* nodes and we are trying to expand the > cluster to another region/DC. I am seeing the below error while doing a > "nodetool rebuild -- name_of_existing_data_center" . > {code:java} > [user@machine ~]$ nodetool rebuild DC1 > nodetool: Unable to find sufficient sources for streaming range > (-402178150752044282,-396707578307430827] in keyspace system_distributed > See 'nodetool help' or 'nodetool help '. > [user@machine ~]$ > {code} > {code:java} > user@cqlsh> SELECT * from system_schema.keyspaces where > keyspace_name='system_distributed'; > keyspace_name | durable_writes | replication > ---++- > system_distributed | True | {'class': > 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} > (1 rows) > {code} > To overcome this I have updated system_distributed keyspace to DC1:3 and > DC2:3 with NetworkTopologyStrategy > C* Version - 3.0.8 > Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't > seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12816) Rebuild failing while adding new datacenter
Jai Bheemsen Rao Dhanwada created CASSANDRA-12816: - Summary: Rebuild failing while adding new datacenter Key: CASSANDRA-12816 URL: https://issues.apache.org/jira/browse/CASSANDRA-12816 Project: Cassandra Issue Type: Bug Reporter: Jai Bheemsen Rao Dhanwada Hello All, I have single datacenter with 3 C* nodes and we are trying to expand the cluster to another region/DC. I am seeing the below error while doing a "nodetool rebuild -- name_of_existing_data_center" . {code:java} [user@machine ~]$ nodetool rebuild DC1 nodetool: Unable to find sufficient sources for streaming range (-402178150752044282,-396707578307430827] in keyspace system_distributed See 'nodetool help' or 'nodetool help '. [user@machine ~]$ {code} {code:java} user@cqlsh> SELECT * from system_schema.keyspaces where keyspace_name='system_distributed'; keyspace_name | durable_writes | replication ---++- system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'} (1 rows) {code} To overcome this I have updated system_distributed keyspace to DC1:3 and DC2:3 with NetworkTopologyStrategy C* Version - 3.0.8 Is this a bug that is introduced in 3.0.8 version of cassandra? as I haven't seen this issue with the older versions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)