[jira] [Commented] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402634#comment-17402634 ] Simon Zhou commented on CASSANDRA-13720: Thanks Ekaterina and Andres for the code review! > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397735#comment-17397735 ] Simon Zhou commented on CASSANDRA-13720: No other new changes. And yes, I squashed the commits. I should have created a separate one for easier review. > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397678#comment-17397678 ] Simon Zhou commented on CASSANDRA-13720: I've updated the PR to also remove _cmd_ in _createQueryThread_. |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/604284c8cce620bf37e6290018a569d3ba53aee9]| > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13720: --- Test and Documentation Plan: (was: Most of the code in the original patch isn't relevant anymore. I've updated the patch based on the latest trunk. |4.0 |[patch | https://github.com/apache/cassandra/pull/1126/commits]|) > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391229#comment-17391229 ] Simon Zhou commented on CASSANDRA-13720: Most of the code in the original patch isn't relevant anymore. I've updated the patch based on the latest trunk. |4.0 |[patch | https://github.com/apache/cassandra/pull/1126/commits]| > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13720: --- Test and Documentation Plan: Most of the code in the original patch isn't relevant anymore. I've updated the patch based on the latest trunk. |4.0 |[patch | https://github.com/apache/cassandra/pull/1126/commits]| Status: Patch Available (was: In Progress) > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373768#comment-17373768 ] Simon Zhou commented on CASSANDRA-13720: wow, it's a long time but right on time. I didn't work on Cassandra for the past 3 years but was just about to come back to this area. I'll take a look in the next few weeks and see if it still applies to 4.0. > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Normal > Fix For: 4.x > > > Lots of unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14382) Use timed wait from Futures in Guava
Simon Zhou created CASSANDRA-14382: -- Summary: Use timed wait from Futures in Guava Key: CASSANDRA-14382 URL: https://issues.apache.org/jira/browse/CASSANDRA-14382 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou We upgraded Guava to 23.3 in trunk and there is timed wait feature (Futures.withTimeout) that we should use. Otherwise we have a whole bunch of stability issues. Generally if something fails or is unresponsive, lots of thread will hang. For example, validation in repair. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408280#comment-16408280 ] Simon Zhou commented on CASSANDRA-14252: [~dikanggu] FYI I had a [fix|https://issues.apache.org/jira/browse/CASSANDRA-13261] for "overloading" issue long time before. Not sure if it's the issue that you had. > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > Attachments: IMG_3180.jpg > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382756#comment-16382756 ] Simon Zhou commented on CASSANDRA-14252: Talked with [~dikanggu] offline. Previously I thought that timeout wouldn't be counted as part of latency score. Actually it is, so setting replica score as 0 by default is less of a problem but only exposes a small vulnerability window: Say if you have multiple replicas in a remote data center and you don't have score for one of them, thus it will be assigned score 0. This might cause traffic burst on this replica, for a short period of time and most of time it won't even be noticed. This can be mitigated by assigning a larger score (such as maximum score of all the replicas) to the replica with null score. I'd defer this decision to [~dikanggu]. Otherwise the patch looks good to me. > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380949#comment-16380949 ] Simon Zhou commented on CASSANDRA-14252: I think I haven't made it clear enough or I misunderstand your fix. Let's say you want to use nodes in remote data center because of whatever issue with local datacenter, my understanding is that: - Either we don't use a node from remote data center that doesn't have score yet, because the reason could be that it's totally unresponsive to previous read requests but still responds to for example, gossip message, thus this node hasn't been marked as down. (A node doesn't have score may also because it hasn't received read request from the remote coordinator node yet, or all the scores got reset after dynamicResetInterval, both of which are less of a problem) - Or maybe we can use it but it should be picked with lower probability than other nodes in the same remote data center. Now you assign a low score of 0 to a node that doesn't have score yet. This means it will be picked with higher probability. If that node truly has problem (unresponsive to read requests), then your fix will cause higher latency. Having said that, I don't mind setting the node score as the highest one among all node scores from the same data center. > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378100#comment-16378100 ] Simon Zhou commented on CASSANDRA-14252: For nodes in the same remote data center, if we don't have score for one node because there is no read response yet and we set an artificially low score 0 for it, does it mean this node will be picked with higher probability than other nodes that have scores? > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377571#comment-16377571 ] Simon Zhou commented on CASSANDRA-14252: This is an interesting change but I'm not sure it fixes all problems. The code that you changed was introduced in CASSANDRA-13074, which also claims to fix slow node issue, by totally ignoring nodes that we don't have a score, no matter it's a node in local or remote data center. Now with your fix, we still give these (remote) nodes a try by assigning an artificially low score. However, isn't 0 the lowest score that could result in these slow/unresponsive remote nodes being picked up before other remote nodes that have normal scores (such as 1.0)? Btw, badness_threshold=0.1 may be too conservative. We also disabled IO factor when calculating the scores through -Dcassandra.ignore_dynamic_snitch_severity=true. See CASSANDRA-11738 for details. > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column
[ https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366468#comment-16366468 ] Simon Zhou commented on CASSANDRA-14200: Btw, I can reproduce the same issue with trunk and the same sstable. That indicates the issue is not fixed yet in trunk but we just need to figure out how we ended up with a null timestamp column. > NullPointerException when dumping sstable with null value for timestamp column > -- > > Key: CASSANDRA-14200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > We have an sstable whose schema has a column of type timestamp and it's not > part of primary key. When dumping the sstable using sstabledump there is NPE > like this: > {code:java} > Exception in thread "main" java.lang.NullPointerException > at java.util.Calendar.setTime(Calendar.java:1770) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > The reason is that we use a null Date when there is no value for this column: > {code} > public Date deserialize(ByteBuffer bytes) > { > return bytes.remaining() == 0 ? null : new > Date(ByteBufferUtil.toLong(bytes)); > } > {code} > It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID
[ https://issues.apache.org/jira/browse/CASSANDRA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366466#comment-16366466 ] Simon Zhou commented on CASSANDRA-14199: Yes, confirmed. Can we do a backport? > exception when dumping sstable with frozen collection of UUID > - > > Key: CASSANDRA-14199 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14199 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > When dumping (sstabledump) sstable with frozen collection of UUID, there is > exception like this: > {code:java} > Exception in thread "main" org.apache.cassandra.serializers.MarshalException: > UUID should be 16 or 0 bytes (24) > at > org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at > org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102) > at > org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > > *Steps to reproduce:* > {code:java} > cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 > list, c2 frozen>, c3 set, c4 frozen>, c5 > map, c6 frozen>); > cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, > c6) VALUES ( 'id', [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], > [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], {'set', 'user'}, {'view', 'over'}, > {'good': 'hello', 'root': 'text'}, {'driver': 'java', 'note': 'new'});{code} > > *Root cause:* > Frozen collection is treated as simple column and it's the client's > responsibility to parse the data from ByteBuffer. We have this logic in > different drivers but sstabledump doesn't have the logic in place. It just > treat the whole collection as a single UUID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column
[ https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366460#comment-16366460 ] Simon Zhou commented on CASSANDRA-14200: [~cnlwsu] your steps won't reproduce the issue because column "ts" will be either unset, which won't even result an entry in sstabledump output, or a valid value. However I cannot reproduce it as well, even by setting "ts" as null explicitly, which actually results in a tombstone and a different code path will be executed in sstabledump. I'm checking with the owner of this table regarding how they do upsert of this column and thus result in problem. > NullPointerException when dumping sstable with null value for timestamp column > -- > > Key: CASSANDRA-14200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > We have an sstable whose schema has a column of type timestamp and it's not > part of primary key. When dumping the sstable using sstabledump there is NPE > like this: > {code:java} > Exception in thread "main" java.lang.NullPointerException > at java.util.Calendar.setTime(Calendar.java:1770) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > The reason is that we use a null Date when there is no value for this column: > {code} > public Date deserialize(ByteBuffer bytes) > { > return bytes.remaining() == 0 ? null : new > Date(ByteBufferUtil.toLong(bytes)); > } > {code} > It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column
[ https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363449#comment-16363449 ] Simon Zhou commented on CASSANDRA-14200: Thanks [~cnlwsu]! I'll provide patches for 3.11 and 4.0 shortly. > NullPointerException when dumping sstable with null value for timestamp column > -- > > Key: CASSANDRA-14200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > We have an sstable whose schema has a column of type timestamp and it's not > part of primary key. When dumping the sstable using sstabledump there is NPE > like this: > {code:java} > Exception in thread "main" java.lang.NullPointerException > at java.util.Calendar.setTime(Calendar.java:1770) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > The reason is that we use a null Date when there is no value for this column: > {code} > public Date deserialize(ByteBuffer bytes) > { > return bytes.remaining() == 0 ? null : new > Date(ByteBufferUtil.toLong(bytes)); > } > {code} > It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID
[ https://issues.apache.org/jira/browse/CASSANDRA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-14199: --- Status: Patch Available (was: Open) > exception when dumping sstable with frozen collection of UUID > - > > Key: CASSANDRA-14199 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14199 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > When dumping (sstabledump) sstable with frozen collection of UUID, there is > exception like this: > {code:java} > Exception in thread "main" org.apache.cassandra.serializers.MarshalException: > UUID should be 16 or 0 bytes (24) > at > org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at > org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102) > at > org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > > *Steps to reproduce:* > {code:java} > cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 > list, c2 frozen>, c3 set, c4 frozen>, c5 > map, c6 frozen>); > cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, > c6) VALUES ( 'id', [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], > [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], {'set', 'user'}, {'view', 'over'}, > {'good': 'hello', 'root': 'text'}, {'driver': 'java', 'note': 'new'});{code} > > *Root cause:* > Frozen collection is treated as simple column and it's the client's > responsibility to parse the data from ByteBuffer. We have this logic in > different drivers but sstabledump doesn't have the logic in place. It just > treat the whole collection as a single UUID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column
[ https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-14200: --- Status: Patch Available (was: Open) > NullPointerException when dumping sstable with null value for timestamp column > -- > > Key: CASSANDRA-14200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > We have an sstable whose schema has a column of type timestamp and it's not > part of primary key. When dumping the sstable using sstabledump there is NPE > like this: > {code:java} > Exception in thread "main" java.lang.NullPointerException > at java.util.Calendar.setTime(Calendar.java:1770) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > The reason is that we use a null Date when there is no value for this column: > {code} > public Date deserialize(ByteBuffer bytes) > { > return bytes.remaining() == 0 ? null : new > Date(ByteBufferUtil.toLong(bytes)); > } > {code} > It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column
[ https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344331#comment-16344331 ] Simon Zhou commented on CASSANDRA-14200: Fix for 3.0 is [here|https://github.com/szhou1234/cassandra/commit/701d7cfa7425e595363a07138baa2c5661d9f1cf]. I'll provide fixes for later versions. [~cnlwsu] could you take a look at this one as well? After the fix, the output of sstabledump looks like this (note the column "terminated_at"): {code} "rows" : [ { "type" : "row", "position" : 302, "clustering" : [ "68dc822e-0481-41d5-8dbd-10bd00703644" ], "liveness_info" : { "tstamp" : "2018-01-24T14:34:45.716783Z" }, "cells" : [ { "name" : "moved_at", "value" : "\"2018-01-24 14:34:45.698Z\"" }, { "name" : "node_uuid", "value" : "\"0b045d26-764c-4023-9570-00b8ebe10cca\"" }, { "name" : "processed_event_uuids", "value" : "[\"28d81650-7119-456c-9d1e-216ed8986a55\"]" }, { "name" : "status", "value" : "0" }, { "name" : "terminated_at" }, { "name" : "event_types", "deletion_info" : { "marked_deleted" : "2018-01-24T14:34:45.716782Z", "local_delete_time" : "2018-01-24T14:34:45Z" } } ] } ] {code} > NullPointerException when dumping sstable with null value for timestamp column > -- > > Key: CASSANDRA-14200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > We have an sstable whose schema has a column of type timestamp and it's not > part of primary key. When dumping the sstable using sstabledump there is NPE > like this: > {code:java} > Exception in thread "main" java.lang.NullPointerException > at java.util.Calendar.setTime(Calendar.java:1770) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > The reason is that we use a null Date when there is no value for this column: > {code} > public Date deserialize(ByteBuffer bytes) > { > return bytes.remaining() == 0 ? null : new > Date(ByteBufferUtil.toLong(bytes)); > } > {code} > It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column
[ https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-14200: --- Summary: NullPointerException when dumping sstable with null value for timestamp column (was: NullPointerException when dumping sstable) > NullPointerException when dumping sstable with null value for timestamp column > -- > > Key: CASSANDRA-14200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > We have an sstable whose schema has a column of type timestamp and it's not > part of primary key. When dumping the sstable using sstabledump there is NPE > like this: > {code:java} > Exception in thread "main" java.lang.NullPointerException > at java.util.Calendar.setTime(Calendar.java:1770) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > The reason is that we use a null Date when there is no value for this column: > {code} > public Date deserialize(ByteBuffer bytes) > { > return bytes.remaining() == 0 ? null : new > Date(ByteBufferUtil.toLong(bytes)); > } > {code} > It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14200) NullPointerException when dumping sstable
Simon Zhou created CASSANDRA-14200: -- Summary: NullPointerException when dumping sstable Key: CASSANDRA-14200 URL: https://issues.apache.org/jira/browse/CASSANDRA-14200 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou Fix For: 3.0.x We have an sstable whose schema has a column of type timestamp and it's not part of primary key. When dumping the sstable using sstabledump there is NPE like this: {code:java} Exception in thread "main" java.lang.NullPointerException at java.util.Calendar.setTime(Calendar.java:1770) at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943) at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) at java.text.DateFormat.format(DateFormat.java:345) at org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93) at org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442) at org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376) at org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280) at org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} The reason is that we use a null Date when there is no value for this column: {code} public Date deserialize(ByteBuffer bytes) { return bytes.remaining() == 0 ? null : new Date(ByteBufferUtil.toLong(bytes)); } {code} It seems that we should not deserialize columns with null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID
[ https://issues.apache.org/jira/browse/CASSANDRA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344218#comment-16344218 ] Simon Zhou commented on CASSANDRA-14199: I pushed a fix for 3.0 [here|https://github.com/szhou1234/cassandra/commit/1a8fff02e93e0acb90e785fdca7f30d9aae54b1a] and will provide fixes for newer versions. [~cnlwsu] could you help review this? For your convenience, this is how the output looks like after the fix: {code} [ { "partition" : { "key" : [ "id" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 181, "liveness_info" : { "tstamp" : "2018-01-29T22:58:49.111820Z" }, "cells" : [ { "name" : "c2", "frozen" : true, "values" : [ "3", "4" ] }, { "name" : "c4", "frozen" : true, "values" : [ "over", "view" ] }, { "name" : "c6", "frozen" : true, "values" : { "driver" : "java", "note" : "new" } }, { "name" : "c1", "deletion_info" : { "marked_deleted" : "2018-01-29T22:58:49.111819Z", "local_delete_time" : "2018-01-29T22:58:49Z" } }, { "name" : "c1", "path" : [ "f7b01890-0547-11e8-817b-adb40ecebcf5" ] }, { "name" : "c1", "path" : [ "f7b01891-0547-11e8-817b-adb40ecebcf5" ] }, { "name" : "c3", "deletion_info" : { "marked_deleted" : "2018-01-29T22:58:49.111819Z", "local_delete_time" : "2018-01-29T22:58:49Z" } }, { "name" : "c3", "path" : [ "set" ] }, { "name" : "c3", "path" : [ "user" ] }, { "name" : "c5", "deletion_info" : { "marked_deleted" : "2018-01-29T22:58:49.111819Z", "local_delete_time" : "2018-01-29T22:58:49Z" } }, { "name" : "c5", "path" : [ "good" ] }, { "name" : "c5", "path" : [ "root" ] } ] } ] } ] {code} Two changes: - I added field "frozen" for frozen collections. - The elements in the frozen collection will be in one line (other than one line per each element in un-frozen collection), to better indicate that they are immutable. There could be another independent issue that, for un-frozen collection, there is always one output line for "deletion_info", even the cell doesn't have any deletion. Anyway there should be a separate fix if it's an issue. > exception when dumping sstable with frozen collection of UUID > - > > Key: CASSANDRA-14199 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14199 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Major > Fix For: 3.0.x > > > When dumping (sstabledump) sstable with frozen collection of UUID, there is > exception like this: > {code:java} > Exception in thread "main" org.apache.cassandra.serializers.MarshalException: > UUID should be 16 or 0 bytes (24) > at > org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440) > at > org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374) > at > org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278) > at > org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at > org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102) > at > org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} > > *Steps to reproduce:* > {code:java} > cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 > list, c2 frozen>, c3 set, c4 frozen>, c5 > map, c6 frozen>); > cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, > c6) VALUES ( 'id', [6947e8c0
[jira] [Created] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID
Simon Zhou created CASSANDRA-14199: -- Summary: exception when dumping sstable with frozen collection of UUID Key: CASSANDRA-14199 URL: https://issues.apache.org/jira/browse/CASSANDRA-14199 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Simon Zhou Assignee: Simon Zhou Fix For: 3.0.x When dumping (sstabledump) sstable with frozen collection of UUID, there is exception like this: {code:java} Exception in thread "main" org.apache.cassandra.serializers.MarshalException: UUID should be 16 or 0 bytes (24) at org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440) at org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374) at org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278) at org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code} *Steps to reproduce:* {code:java} cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 list, c2 frozen>, c3 set, c4 frozen>, c5 map, c6 frozen>); cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, c6) VALUES ( 'id', [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], {'set', 'user'}, {'view', 'over'}, {'good': 'hello', 'root': 'text'}, {'driver': 'java', 'note': 'new'});{code} *Root cause:* Frozen collection is treated as simple column and it's the client's responsibility to parse the data from ByteBuffer. We have this logic in different drivers but sstabledump doesn't have the logic in place. It just treat the whole collection as a single UUID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13877) Potential concurrency issue with CDC size calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou resolved CASSANDRA-13877. Resolution: Invalid Normally if you have multiple writers (producers), the add operation may use a stale local copy even if the variable is volatile. Now since we have single producer here, it's safe to use volatile, as citing from the book: {code} You can use volatile variables only when all the following criteria are met: • Writes to the variable do not depend on its current value, or you can ensure that only a single thread ever updates the value; ... {code} Thanks for commenting. I'm closing this ticket. > Potential concurrency issue with CDC size calculation > - > > Key: CASSANDRA-13877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13877 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > > We're backporting CDC feature and bug fixes to 3.0. There is potential > visibility issue with two variables {{CDCSizeTracker.sizeInProgress}} and > {{DirectorySizeCalculator.size}} . They're declared as volatile however there > are cases that when assigning new values to them, the new values depend on > the current value. For example: > https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L285 > https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L297 > In rare cases we'll not be able to calculate CDC data size correctly. We > should change these two variables back to AtomicLong, as the simplest fix. > Java Concurrency In Practice section 3.1.3 explains well why we shouldn't use > volatile in these two cases. I'll provide patch shortly. > cc [~JoshuaMcKenzie] [~jay.zhuang] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13877) Potential concurrency issue with CDC size calculation
Simon Zhou created CASSANDRA-13877: -- Summary: Potential concurrency issue with CDC size calculation Key: CASSANDRA-13877 URL: https://issues.apache.org/jira/browse/CASSANDRA-13877 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou We're backporting CDC feature and bug fixes to 3.0. There is potential visibility issue with two variables CDCSizeTracker.sizeInProgress and DirectorySizeCalculator.size. They're declared as volatile however there are cases that when assigning new values to them, the new values depend on the current value. For example: https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L285 https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L297 In rare cases we'll not be able to calculate CDC data size correctly. We should change these two variables back to AtomicLong, as the simplest fix. Java Concurrency In Practice section 3.1.3 explains well why we shouldn't use volatile in these two cases. I'll provide patch shortly. cc [~JoshuaMcKenzie] [~jay.zhuang] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
[ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13323: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Yeap. That's the right fix. Thanks for comment. > IncomingTcpConnection closed due to one bad message > --- > > Key: CASSANDRA-13323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13323 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.x > > Attachments: CASSANDRA-13323-v1.patch > > > We got this exception: > {code} > WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 > IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from > socket; closing > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this > is likely due to the schema not being fully propagated. Please wait for > schema agreement on table creation. > at > org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.10.jar:3.0.10] > {code} > Also we saw this log in another host indicating it needs to re-connect: > {code} > INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 > OutboundTcpConnection.java:515 - Handshaking version with / > {code} > The reason is that the node was receiving hinted data for a dropped table. > This may happen with other messages as well. On Cassandra side, > IncomingTcpConnection shouldn't close on just one bad message, even though it > will be restarted soon later by SocketThread in MessagingService. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125942#comment-16125942 ] Simon Zhou commented on CASSANDRA-13387: Stefan, thank you so much for code review! > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 4.0 > > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13737) Node start can fail if the base table of a materialized view is not found
[ https://issues.apache.org/jira/browse/CASSANDRA-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109856#comment-16109856 ] Simon Zhou commented on CASSANDRA-13737: We had the same issue on 3.0.14 couple of days ago. Looks like somehow the MV data was corrupted and restart of any data would be stuck. Even "drop MV" from cqlsh doesn't work (on a different node, before restart) because the base table doesn't exist. > Node start can fail if the base table of a materialized view is not found > - > > Key: CASSANDRA-13737 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13737 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata, Materialized Views >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña > Fix For: 3.0.x, 3.11.x, 4.x > > > Node start can fail if the base table of a materialized view is not found, > which is something that can happen under certain circumstances. There is a > dtest reproducing the problem: > {code} > cluster = self.cluster > cluster.populate(3) > cluster.start() > node1, node2, node3 = self.cluster.nodelist() > session = self.patient_cql_connection(node1, > consistency_level=ConsistencyLevel.QUORUM) > create_ks(session, 'ks', 3) > session.execute('CREATE TABLE users (username varchar PRIMARY KEY, state > varchar)') > node3.stop(wait_other_notice=True) > # create a materialized view only in nodes 1 and 2 > session.execute(('CREATE MATERIALIZED VIEW users_by_state AS ' > 'SELECT * FROM users WHERE state IS NOT NULL AND username IS > NOT NULL ' > 'PRIMARY KEY (state, username)')) > node1.stop(wait_other_notice=True) > node2.stop(wait_other_notice=True) > # drop the base table only in node 3 > node3.start(wait_for_binary_proto=True) > session = self.patient_cql_connection(node3, > consistency_level=ConsistencyLevel.QUORUM) > session.execute('DROP TABLE ks.users') > cluster.stop() > cluster.start() # Fails > {code} > This is the error during node start: > {code} > java.lang.IllegalArgumentException: Unknown CF > 958ebc30-76e4-11e7-869a-9d8367a71c76 > at > org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:215) > ~[main/:na] > at > org.apache.cassandra.db.view.ViewManager.addView(ViewManager.java:143) > ~[main/:na] > at > org.apache.cassandra.db.view.ViewManager.reload(ViewManager.java:113) > ~[main/:na] > at org.apache.cassandra.schema.Schema.alterKeyspace(Schema.java:618) > ~[main/:na] > at org.apache.cassandra.schema.Schema.lambda$merge$18(Schema.java:591) > ~[main/:na] > at > java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575) > ~[na:1.8.0_131] > at java.util.HashMap$EntrySet.forEach(HashMap.java:1043) ~[na:1.8.0_131] > at > java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.forEach(Collections.java:1580) > ~[na:1.8.0_131] > at org.apache.cassandra.schema.Schema.merge(Schema.java:591) ~[main/:na] > at > org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:564) > ~[main/:na] > at > org.apache.cassandra.schema.MigrationTask$1.response(MigrationTask.java:89) > ~[main/:na] > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) > ~[main/:na] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_131] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_131] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_131] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_131] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [main/:na] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097004#comment-16097004 ] Simon Zhou commented on CASSANDRA-13720: Not meant to be fixing something, but there are places that I want to make the code less confusing: |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/b9a410b74f42af7519010dff1fd03372ce38a412]| [~spo...@gmail.com] Could you please review this patch? It's rebased on my patch for CASSANDRA-13387. Thank you. > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 4.0 > > > Lots of unused code. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13720) Clean up repair code
[ https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13720: --- Status: Patch Available (was: Open) > Clean up repair code > > > Key: CASSANDRA-13720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 4.0 > > > Lots of unused code. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13720) Clean up repair code
Simon Zhou created CASSANDRA-13720: -- Summary: Clean up repair code Key: CASSANDRA-13720 URL: https://issues.apache.org/jira/browse/CASSANDRA-13720 Project: Cassandra Issue Type: Improvement Reporter: Simon Zhou Assignee: Simon Zhou Fix For: 4.0 Lots of unused code. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096978#comment-16096978 ] Simon Zhou commented on CASSANDRA-13387: Just realized that's a public interface and I should keep it. Updated patch and rebased: |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/306ed07aa4a7a572d085c41fd0c1067719505262]| Btw, repair code needs to be cleaned, e.g., there are some unused variables, etc. I'll open another ticket for this. > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091917#comment-16091917 ] Simon Zhou commented on CASSANDRA-13387: Rebased patch: |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/54ad6690fd306fe2b5a93d73808064ab29f1]| > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13679) Add option to customize badness_threshold in dynamic endpoint snitch
[ https://issues.apache.org/jira/browse/CASSANDRA-13679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou resolved CASSANDRA-13679. Resolution: Not A Problem Just realized there is a cassandra.yaml option so this ticket is not needed. > Add option to customize badness_threshold in dynamic endpoint snitch > > > Key: CASSANDRA-13679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13679 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > Attachments: Screen Shot 2017-07-07 at 5.01.48 PM.png > > > I'm working on tuning dynamic endpoint snitch and looks like the default > value (0.1) for Config.dynamic_snitch_badness_threshold is too sensitive and > causes traffic imbalance among nodes, especially with my patch for > CASSANDRA-13577. So we should: > 1. Revisit the default value. > 2. Add an option to allow customize badness_threshold during bootstrap. > This ticket is to track #2. I attached a screenshot to show that, after > increasing badness_threshold from 0.1 to 1.0 by using patch from > CASSANDRA-12179, the traffic imbalance is gone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13679) Add option to customize badness_threshold in dynamic endpoint snitch
[ https://issues.apache.org/jira/browse/CASSANDRA-13679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087527#comment-16087527 ] Simon Zhou commented on CASSANDRA-13679: Here are the patches. Not sure if we need one for 3.11. |3.0.x |[patch | https://github.com/szhou1234/cassandra/commit/50cc71418d3fc75b1d8225eb1bded95ac1f1bdd7]| |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/a7144f8d50872dc4e5591db73ff770388d410403]| > Add option to customize badness_threshold in dynamic endpoint snitch > > > Key: CASSANDRA-13679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13679 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > Attachments: Screen Shot 2017-07-07 at 5.01.48 PM.png > > > I'm working on tuning dynamic endpoint snitch and looks like the default > value (0.1) for Config.dynamic_snitch_badness_threshold is too sensitive and > causes traffic imbalance among nodes, especially with my patch for > CASSANDRA-13577. So we should: > 1. Revisit the default value. > 2. Add an option to allow customize badness_threshold during bootstrap. > This ticket is to track #2. I attached a screenshot to show that, after > increasing badness_threshold from 0.1 to 1.0 by using patch from > CASSANDRA-12179, the traffic imbalance is gone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case
[ https://issues.apache.org/jira/browse/CASSANDRA-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078856#comment-16078856 ] Simon Zhou commented on CASSANDRA-13577: Now with this patch, badness_threshold looks too sensitive and triggers traffic imbalance. I created CASSANDRA-13679 to follow up on that. > Fix dynamic endpoint snitch for sub-millisecond use case > > > Key: CASSANDRA-13577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13577 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.x > > > This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. > After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few > production clusters, I observed that the scores for all the endpoints are > mostly 0.0. Through debugging, I found this is caused by that these clusters > have p50 latency well below 1ms and the network latency is also <0.1ms (round > trip). Be noted that we use p50 sampled read latency and millisecond as time > unit. That means, if the latency is mostly below 1ms, the score will be 0. > This is definitely not something we want. To make DES work for these > sub-millisecond use cases, we should change the timeunit to at least > microsecond, or even nanosecond. I'll provide a patch soon. > Evidence of the p50 latency: > {code} > nodetool tablehistograms > Percentile SSTables Write Latency Read LatencyPartition Size > Cell Count > (micros) (micros) (bytes) > > 50% 2.00 35.43454.83 20501 > 3 > 75% 2.00 42.51654.95 29521 > 3 > 95% 3.00182.79943.13 61214 > 3 > 98% 4.00263.21 1131.75 73457 > 3 > 99% 4.00315.85 1358.10 88148 > 3 > Min 0.00 9.89 11.8761 > 3 > Max 5.00654.95 129557.75943127 > 3 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13679) Add option to customize badness_threshold in dynamic endpoint snitch
Simon Zhou created CASSANDRA-13679: -- Summary: Add option to customize badness_threshold in dynamic endpoint snitch Key: CASSANDRA-13679 URL: https://issues.apache.org/jira/browse/CASSANDRA-13679 Project: Cassandra Issue Type: Improvement Reporter: Simon Zhou Assignee: Simon Zhou Attachments: Screen Shot 2017-07-07 at 5.01.48 PM.png I'm working on tuning dynamic endpoint snitch and looks like the default value (0.1) for Config.dynamic_snitch_badness_threshold is too sensitive and causes traffic imbalance among nodes, especially with my patch for CASSANDRA-13577. So we should: 1. Revisit the default value. 2. Add an option to allow customize badness_threshold during bootstrap. This ticket is to track #2. I attached a screenshot to show that, after increasing badness_threshold from 0.1 to 1.0 by using patch from CASSANDRA-12179, the traffic imbalance is gone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-10862) LCS repair: compact tables before making available in L0
[ https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049505#comment-16049505 ] Simon Zhou edited comment on CASSANDRA-10862 at 6/14/17 6:42 PM: - [~scv...@gmail.com] Are you still working on this? was (Author: szhou): [~chenshen] Are you still working on this? > LCS repair: compact tables before making available in L0 > > > Key: CASSANDRA-10862 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10862 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Streaming and Messaging >Reporter: Jeff Ferland >Assignee: Chen Shen > Labels: lcs > > When doing repair on a system with lots of mismatched ranges, the number of > tables in L0 goes up dramatically, as correspondingly goes the number of > tables referenced for a query. Latency increases dramatically in tandem. > Eventually all the copied tables are compacted down in L0, then copied into > L1 (which may be a very large copy), finally reducing the number of SSTables > per query into the manageable range. > It seems to me that the cleanest answer is to compact after streaming, then > mark tables available rather than marking available when the file itself is > complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0
[ https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049505#comment-16049505 ] Simon Zhou commented on CASSANDRA-10862: [~chenshen] Are you still working on this? > LCS repair: compact tables before making available in L0 > > > Key: CASSANDRA-10862 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10862 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Streaming and Messaging >Reporter: Jeff Ferland >Assignee: Chen Shen > Labels: lcs > > When doing repair on a system with lots of mismatched ranges, the number of > tables in L0 goes up dramatically, as correspondingly goes the number of > tables referenced for a query. Latency increases dramatically in tandem. > Eventually all the copied tables are compacted down in L0, then copied into > L1 (which may be a very large copy), finally reducing the number of SSTables > per query into the manageable range. > It seems to me that the cleanest answer is to compact after streaming, then > mark tables available rather than marking available when the file itself is > complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case
[ https://issues.apache.org/jira/browse/CASSANDRA-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042055#comment-16042055 ] Simon Zhou commented on CASSANDRA-13577: Here are the patches. Not sure if we need one for 3.11. |3.0.x |[patch | https://github.com/szhou1234/cassandra/commit/50a0a081f976d94b2d6f7883e28d4c427baa120c]| |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/73a3ff467a852eec7993efb0133945416bad4e46]| > Fix dynamic endpoint snitch for sub-millisecond use case > > > Key: CASSANDRA-13577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13577 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.x > > > This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. > After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few > production clusters, I observed that the scores for all the endpoints are > mostly 0.0. Through debugging, I found this is caused by that these clusters > have p50 latency well below 1ms and the network latency is also <0.1ms (round > trip). Be noted that we use p50 sampled read latency and millisecond as time > unit. That means, if the latency is mostly below 1ms, the score will be 0. > This is definitely not something we want. To make DES work for these > sub-millisecond use cases, we should change the timeunit to at least > microsecond, or even nanosecond. I'll provide a patch soon. > Evidence of the p50 latency: > {code} > nodetool tablehistograms > Percentile SSTables Write Latency Read LatencyPartition Size > Cell Count > (micros) (micros) (bytes) > > 50% 2.00 35.43454.83 20501 > 3 > 75% 2.00 42.51654.95 29521 > 3 > 95% 3.00182.79943.13 61214 > 3 > 98% 4.00263.21 1131.75 73457 > 3 > 99% 4.00315.85 1358.10 88148 > 3 > Min 0.00 9.89 11.8761 > 3 > Max 5.00654.95 129557.75943127 > 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case
[ https://issues.apache.org/jira/browse/CASSANDRA-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13577: --- Status: Patch Available (was: Open) > Fix dynamic endpoint snitch for sub-millisecond use case > > > Key: CASSANDRA-13577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13577 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.x > > > This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. > After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few > production clusters, I observed that the scores for all the endpoints are > mostly 0.0. Through debugging, I found this is caused by that these clusters > have p50 latency well below 1ms and the network latency is also <0.1ms (round > trip). Be noted that we use p50 sampled read latency and millisecond as time > unit. That means, if the latency is mostly below 1ms, the score will be 0. > This is definitely not something we want. To make DES work for these > sub-millisecond use cases, we should change the timeunit to at least > microsecond, or even nanosecond. I'll provide a patch soon. > Evidence of the p50 latency: > {code} > nodetool tablehistograms > Percentile SSTables Write Latency Read LatencyPartition Size > Cell Count > (micros) (micros) (bytes) > > 50% 2.00 35.43454.83 20501 > 3 > 75% 2.00 42.51654.95 29521 > 3 > 95% 3.00182.79943.13 61214 > 3 > 98% 4.00263.21 1131.75 73457 > 3 > 99% 4.00315.85 1358.10 88148 > 3 > Min 0.00 9.89 11.8761 > 3 > Max 5.00654.95 129557.75943127 > 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case
Simon Zhou created CASSANDRA-13577: -- Summary: Fix dynamic endpoint snitch for sub-millisecond use case Key: CASSANDRA-13577 URL: https://issues.apache.org/jira/browse/CASSANDRA-13577 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou Fix For: 3.0.x This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few production clusters, I observed that the scores for all the endpoints are mostly 0.0. Through debugging, I found this is caused by that these clusters have p50 latency well below 1ms and the network latency is also <0.1ms (round trip). Be noted that we use p50 sampled read latency and millisecond as time unit. That means, if the latency is mostly below 1ms, the score will be 0. This is definitely not something we want. To make DES work for these sub-millisecond use cases, we should change the timeunit to at least microsecond, or even nanosecond. I'll provide a patch soon. Evidence of the p50 latency: {code} nodetool tablehistograms Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 2.00 35.43454.83 20501 3 75% 2.00 42.51654.95 29521 3 95% 3.00182.79943.13 61214 3 98% 4.00263.21 1131.75 73457 3 99% 4.00315.85 1358.10 88148 3 Min 0.00 9.89 11.8761 3 Max 5.00654.95 129557.75943127 3 {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load
[ https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039888#comment-16039888 ] Simon Zhou commented on CASSANDRA-6908: --- Luckily we have a JVM option to disable severity when calculating scores (CASSANDRA-11737/CASSANDRA-11738). After applying this option in a few production clusters, however I observed that the scores for all the endpoints are mostly 0.0. Through debugging, I found this is caused by that these clusters have p50 latency well below 1ms and the network latency is also <0.1ms (round trip). Be noted that we use p50 sampled read latency and millisecond as time unit. That means, if the latency is mostly below 1ms, the score will be 0. This is definitely not something we want. To make DES work for these sub-millisecond use cases, we should change the timeunit to at least microsecond, or even nanosecond. I'll create another ticket for this. {code} nodetool tablehistograms Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 2.00 35.43545.79 20501 3 75% 3.00 42.51654.95 35425 3 95% 3.00152.32943.13 61214 3 98% 4.00263.21 1131.75 73457 3 99% 4.00315.85 1358.10 88148 3 Min 0.00 9.89 9.8961 3 Max 5.00785.94 89970.66 1131752 3 {code} > Dynamic endpoint snitch destabilizes cluster under heavy load > - > > Key: CASSANDRA-6908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6908 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Bartłomiej Romański >Assignee: Brandon Williams > Attachments: > 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch, > as-dynamic-snitch-disabled.png > > > We observe that with dynamic snitch disabled our cluster is much more stable > than with dynamic snitch enabled. > We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB > RAM, 2x480 GB SSD). We mostly do reads (about 300k/s). > We use Astyanax on client side with TOKEN_AWARE option enabled. It > automatically direct read queries to one of the nodes responsible the given > token. > In that case with dynamic snitch disabled Cassandra always handles read > locally. With dynamic snitch enabled Cassandra very often decides to proxy > the read to some other node. This causes much higher CPU usage and produces > much more garbage what results in more often GC pauses (young generation > fills up quicker). By "much higher" and "much more" I mean 1.5-2x. > I'm aware that higher dynamic_snitch_badness_threshold value should solve > that issue. The default value is 0.1. I've looked at scores exposed in JMX > and the problem is that our values seemed to be completely random. They are > between usually 0.5 and 2.0, but changes randomly every time I hit refresh. > Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something > like that, but the result will be similar to simply disabling the dynamic > switch at all (that's what we done). > I've tried to understand what's the logic behind these scores and I'm not > sure if I get the idea... > It's a sum (without any multipliers) of two components: > - ratio of recent given node latency to recent average node latency > - something called 'severity', what, if I analyzed the code correctly, is a > result of BackgroundActivityMonitor.getIOWait() - it's a ratio of "iowait" > CPU time to the whole CPU time as reported in /proc/stats (the ratio is > multiplied by 100) > In our case the second value is something around 0-2% but varies quite > heavily every second. > What's the idea behind simply adding this two values without any multipliers > (e.g the second one is in percentage while the first one is not)? Are we sure > this is the best possible way of calculating the final score? > Is there a way too force Cassandra to use (much) longer samples? In our case > we probably need that to get stable values. The 'severity' is calculated for > each second. The mean latency is calculated based on some magic, hardcoded > values (ALPHA = 0.75, WINDOW_SIZE = 100). > Am I right that there's no way to tune that without hacking the code? > I'm aware that there's dynamic_snitch_update_interval_in_ms property in the > co
[jira] [Commented] (CASSANDRA-10876) Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037786#comment-16037786 ] Simon Zhou commented on CASSANDRA-10876: I intended to backport this (see CASSANDRA-13467) but may need a committer to review it. > Alter behavior of batch WARN and fail on single partition batches > - > > Key: CASSANDRA-10876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10876 > Project: Cassandra > Issue Type: Improvement >Reporter: Patrick McFadin >Assignee: Sylvain Lebresne >Priority: Minor > Labels: lhf > Fix For: 3.6 > > Attachments: 10876.txt > > > In an attempt to give operator insight into potentially harmful batch usage, > Jiras were created to log WARN or fail on certain batch sizes. This ignores > the single partition batch, which doesn't create the same issues as a > multi-partition batch. > The proposal is to ignore size on single partition batch statements. > Reference: > [CASSANDRA-6487|https://issues.apache.org/jira/browse/CASSANDRA-6487] > [CASSANDRA-8011|https://issues.apache.org/jira/browse/CASSANDRA-8011] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load
[ https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034112#comment-16034112 ] Simon Zhou edited comment on CASSANDRA-6908 at 6/2/17 5:43 AM: --- We got similar issue and thus I worked out a simple patch (attached) to decouple scores for iowait and sampled read latency. From my observation, there are two issues: 1. The iowait score of one node changes frequently and the gaps among the scores for different nodes are usually far beyond the default 1.1 threshold. 2. The (median) latency scores don't vary too much but the differences may still be more than 1.1x. Also some nodes from local datacenter have 0 latency scores. I understand that the nodes in the remote datacenter may not have latency data since local_quorum or local_one is being used. The issue for remote data center has actually been fixed in CASSANDRA-13074 (we're running 3.0.13). There are the numbers I got (formatted) from a two-datacenter cluster (10 nodes in each datacenter), with my patch. The ip addresses have been obfuscated. {code} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores 06/01/2017 23:30:36 + org.archive.jmx.Client LatencyScores: { /node1=0.7832167832167832 /node2=0.0 /node3=1.0 /node4=0.0 /node5=0.0 /node6=0.43356643356643354 /node7=0.4825174825174825 /node8=0.0 /node9=0.8881118881118881 /node10=0.0 /node11=0.9440559440559441 /node12=0.0 /node13=0.0 /node14=0.0 /node15=0.0 /node16=0.0} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores 06/01/2017 23:30:45 + org.archive.jmx.Client LatencyScores: { /node1=0.0 /node2=1.0 /node3=0.0 /node4=0.0 /node5=0.43356643356643354 /node6=0.4825174825174825 /node7=0.0 /node8=0.8881118881118881 /node9=0.0 /node10=0.9440559440559441 /node11=0.0 /node12=0.0 /node13=0.0 /node15=0.0 /node16=0.0 /node17=0.7832167832167832 } szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch IOWaitScores 06/01/2017 23:30:54 + org.archive.jmx.Client IOWaitScores: { /node1=5.084033489227295 /node2=4.024896621704102 /node3=4.54736852645874 /node4=4.947588920593262 /node5=3.4599156379699707 /node6=4.0653815269470215 /node7=6.989473819732666 /node8=3.371259927749634 /node9=5.800169467926025 /node10=3.2855939865112305 /node11=5.631399154663086 /node12=5.484004974365234 /node13=0.9635525941848755 /node14=1.5043878555297852 /node15=6.481481552124023 /node16=3.751563310623169} {code} Yes we can workaround the issue by increasing the badness_threshold. But the problems are: 1. The default threshold doesn't work well. 2. iowait (percentage) is not a good measurement of end to end latency, not only because it changes frequently, from second to second, but also it's just a low level metric that doesn't reflect the whole picture, which should also include GC/safepoint pauses, thread scheduling delays, etc. 3. Instead of using median read latency, can we use maybe p95 latency as a better factor when calculating scores? I haven't experimented this yet. [~brandon.williams] what do you think? [~kohlisankalp] Looks like we have some fix (or improvements?) in 4.0 but you mentioned in a meeting that DES could be improved. I'd also like get your ideas on this. I can work on this if we can agree on something. was (Author: szhou): We got similar issue and thus I worked out a simple patch (attached) to decouple scores for iowait and sampled read latency. From my observation, there are two issues: 1. The iowait score of one node changes frequently and the gaps among the scores for different nodes are usually far beyond the default 1.1 threshold. 2. The (median) latency scores don't vary too much however some nodes have 0 latency scores, even with the fix for CASSANDRA-13074 (we're running 3.0.13). There are the numbers I got (formatted) with my attached patch: {code} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores 06/01/2017 23:30:36 + org.archive.jmx.Client LatencyScores: { /node1=0.7832167832167832 /node2=0.0 /node3=1.0 /node4=0.0 /node5=0.0 /node6=0.43356643356643354 /node7=0.4825174825174825 /node8=0.0 /node9=0.8881118881118881 /node10=0.0 /node11=0.9440559440559441 /node12=0.0 /node13=0.0 /node14=0.0 /node15=0.0 /node16=0.0} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores 06/01/2017 23:30:45 + org.archive.jmx.Client LatencyScores: {/10.165.10.5=0.7832167832167832 /node1=0.0 /node2=1.0 /node3=0.0 /node4=0.0 /node5=0.43356643356643354 /node6=0.4825174825174825 /node7=0.0 /node8=0.8881118881118881 /node9=0.0 /node10=0.9440559
[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load
[ https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034112#comment-16034112 ] Simon Zhou commented on CASSANDRA-6908: --- We got similar issue and thus I worked out a simple patch (attached) to decouple scores for iowait and sampled read latency. From my observation, there are two issues: 1. The iowait score of one node changes frequently and the gaps among the scores for different nodes are usually far beyond the default 1.1 threshold. 2. The (median) latency scores don't vary too much however some nodes have 0 latency scores, even with the fix for CASSANDRA-13074 (we're running 3.0.13). There are the numbers I got (formatted) with my attached patch: {code} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores 06/01/2017 23:30:36 + org.archive.jmx.Client LatencyScores: { /node1=0.7832167832167832 /node2=0.0 /node3=1.0 /node4=0.0 /node5=0.0 /node6=0.43356643356643354 /node7=0.4825174825174825 /node8=0.0 /node9=0.8881118881118881 /node10=0.0 /node11=0.9440559440559441 /node12=0.0 /node13=0.0 /node14=0.0 /node15=0.0 /node16=0.0} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores 06/01/2017 23:30:45 + org.archive.jmx.Client LatencyScores: {/10.165.10.5=0.7832167832167832 /node1=0.0 /node2=1.0 /node3=0.0 /node4=0.0 /node5=0.43356643356643354 /node6=0.4825174825174825 /node7=0.0 /node8=0.8881118881118881 /node9=0.0 /node10=0.9440559440559441 /node11=0.0 /node12=0.0 /node13=0.0 /node15=0.0 /node16=0.0} szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 org.apache.cassandra.db:type=DynamicEndpointSnitch IOWaitScores 06/01/2017 23:30:54 + org.archive.jmx.Client IOWaitScores: { /node1=5.084033489227295 /node2=4.024896621704102 /node3=4.54736852645874 /node4=4.947588920593262 /node5=3.4599156379699707 /node6=4.0653815269470215 /node7=6.989473819732666 /node8=3.371259927749634 /node9=5.800169467926025 /node10=3.2855939865112305 /node11=5.631399154663086 /node12=5.484004974365234 /node13=0.9635525941848755 /node14=1.5043878555297852 /node15=6.481481552124023 /node16=3.751563310623169} {code} Yes we can workaround the issue by increasing the badness_threshold. But the problems are: 1. The default threshold doesn't work well. 2. iowait (percentage) is not a good measurement of end to end latency, not only because it changes frequently, from second to second, but also it's just a low level metric that doesn't reflect the whole picture, which should also include GC/safepoint pauses, thread scheduling delays, etc. 3. Instead of using median read latency, can we use maybe p95 latency as a better factor when calculating scores? I haven't experimented this yet. [~brandon.williams] what do you think? [~kohlisankalp] Looks like we have some fix (or improvements?) in 4.0 but you mentioned in a meeting that DES could be improved. I'd also like get your ideas on this. I can work on this if we can agree on something. > Dynamic endpoint snitch destabilizes cluster under heavy load > - > > Key: CASSANDRA-6908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6908 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Bartłomiej Romański >Assignee: Brandon Williams > Attachments: > 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch, > as-dynamic-snitch-disabled.png > > > We observe that with dynamic snitch disabled our cluster is much more stable > than with dynamic snitch enabled. > We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB > RAM, 2x480 GB SSD). We mostly do reads (about 300k/s). > We use Astyanax on client side with TOKEN_AWARE option enabled. It > automatically direct read queries to one of the nodes responsible the given > token. > In that case with dynamic snitch disabled Cassandra always handles read > locally. With dynamic snitch enabled Cassandra very often decides to proxy > the read to some other node. This causes much higher CPU usage and produces > much more garbage what results in more often GC pauses (young generation > fills up quicker). By "much higher" and "much more" I mean 1.5-2x. > I'm aware that higher dynamic_snitch_badness_threshold value should solve > that issue. The default value is 0.1. I've looked at scores exposed in JMX > and the problem is that our values seemed to be completely random. They are > between usually 0.5 and 2.0, but changes randomly every time I hit refresh. > Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something > like that, but the result will be similar to simply disabling th
[jira] [Updated] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load
[ https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-6908: -- Attachment: 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch > Dynamic endpoint snitch destabilizes cluster under heavy load > - > > Key: CASSANDRA-6908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6908 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Bartłomiej Romański >Assignee: Brandon Williams > Attachments: > 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch, > as-dynamic-snitch-disabled.png > > > We observe that with dynamic snitch disabled our cluster is much more stable > than with dynamic snitch enabled. > We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB > RAM, 2x480 GB SSD). We mostly do reads (about 300k/s). > We use Astyanax on client side with TOKEN_AWARE option enabled. It > automatically direct read queries to one of the nodes responsible the given > token. > In that case with dynamic snitch disabled Cassandra always handles read > locally. With dynamic snitch enabled Cassandra very often decides to proxy > the read to some other node. This causes much higher CPU usage and produces > much more garbage what results in more often GC pauses (young generation > fills up quicker). By "much higher" and "much more" I mean 1.5-2x. > I'm aware that higher dynamic_snitch_badness_threshold value should solve > that issue. The default value is 0.1. I've looked at scores exposed in JMX > and the problem is that our values seemed to be completely random. They are > between usually 0.5 and 2.0, but changes randomly every time I hit refresh. > Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something > like that, but the result will be similar to simply disabling the dynamic > switch at all (that's what we done). > I've tried to understand what's the logic behind these scores and I'm not > sure if I get the idea... > It's a sum (without any multipliers) of two components: > - ratio of recent given node latency to recent average node latency > - something called 'severity', what, if I analyzed the code correctly, is a > result of BackgroundActivityMonitor.getIOWait() - it's a ratio of "iowait" > CPU time to the whole CPU time as reported in /proc/stats (the ratio is > multiplied by 100) > In our case the second value is something around 0-2% but varies quite > heavily every second. > What's the idea behind simply adding this two values without any multipliers > (e.g the second one is in percentage while the first one is not)? Are we sure > this is the best possible way of calculating the final score? > Is there a way too force Cassandra to use (much) longer samples? In our case > we probably need that to get stable values. The 'severity' is calculated for > each second. The mean latency is calculated based on some magic, hardcoded > values (ALPHA = 0.75, WINDOW_SIZE = 100). > Am I right that there's no way to tune that without hacking the code? > I'm aware that there's dynamic_snitch_update_interval_in_ms property in the > config file, but that only determines how often the scores are recalculated > not how long samples are taken. Is that correct? > To sum up, It would be really nice to have more control over dynamic snitch > behavior or at least have the official option to disable it described in the > default config file (it took me some time to discover that we can just > disable it instead of hacking with dynamic_snitch_badness_threshold=1000). > Currently for some scenarios (like ours - optimized cluster, token aware > client, heavy load) it causes more harm than good. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13555) Thread leak during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032401#comment-16032401 ] Simon Zhou commented on CASSANDRA-13555: Thanks [~tjake] for the comment. I'll be working on the patch but I'm not sure whether that is the best fix. Reasons: 1. The "executor" is created in RepairRunnable and runs all RepairJob's for a given keyspace. It's not a single RepairSession instance's responsibility to stop the "executor", nor it has a reference to it. 2. The bigger problem is that, why do we handle "node down" in RepairSession? IMHO it should be handled at a higher level. That means, once an endpoint is down, we should stop all RepairRunnable's. Sure there could be improvement, e.g., only stop those affected RepairSession's (token ranges). But anyway we are not doing this today and it deserves a separate change. What do you think? I know there is bigger change in upcoming 4.0 but I don't want a band-aid fix that just makes things messy. > Thread leak during repair > - > > Key: CASSANDRA-13555 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13555 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > > The symptom is similar to what happened in [CASSANDRA-13204 | > https://issues.apache.org/jira/browse/CASSANDRA-13204] that the thread > waiting forever doing nothing. This one happened during "nodetool repair -pr > -seq -j 1" in production but I can easily simulate the problem with just > "nodetool repair" in dev environment (CCM). I'm trying to explain what > happened with 3.0.13 code base. > 1. One node is down while doing repair. This is the error I saw in production: > {code} > ERROR [GossipTasks:1] 2017-05-19 15:00:10,545 RepairSession.java:334 - > [repair #bc9a3cd1-3ca3-11e7-a44a-e30923ac9336] session completed with the > following error > java.io.IOException: Endpoint /10.185.43.15 died > at > org.apache.cassandra.repair.RepairSession.convict(RepairSession.java:333) > ~[apache-cassandra-3.0.11.jar:3.0.11] > at > org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) > [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:766) > [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:66) > [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:181) > [apache-cassandra-3.0.11.jar:3.0.11] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > [apache-cassandra-3.0.11.jar:3.0.11] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_121] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_121] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_121] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > [apache-cassandra-3.0.11.jar:3.0.11] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {code} > 2. At this moment the repair coordinator hasn't received the response > (MerkleTrees) for the node that was marked down. This means, RepairJob#run > will never return because it waits for validations to finish: > {code} > // Wait for validation to complete > Futures.getUnchecked(validations); > {code} > Be noted that all RepairJob's (as Runnable) run on a shared executor created > in RepairRunnable#runMayThrow, while all snapshot, validation and sync'ing > happen on a per-RepairSession "taskExecutor". The RepairJob#run will only > return when it receives MerkleTrees (or null) from all endpoints for a given > column family and token range. > As evidence of the thread leak, below is from the thread dump. I can also get > the same stack trace when simulating the same issue in dev environment. > {code} > "Repair#129:56" #406373 daemon prio=5 os_prio=0 tid=0x7fc495028400 > nid=0x1a77d waiting on condition [0x7fc02153] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0002d7c00198> (a > com.google.common.util.concurrent.Abstract
[jira] [Created] (CASSANDRA-13555) Thread leak during repair
Simon Zhou created CASSANDRA-13555: -- Summary: Thread leak during repair Key: CASSANDRA-13555 URL: https://issues.apache.org/jira/browse/CASSANDRA-13555 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou The symptom is similar to what happened in [CASSANDRA-13204 | https://issues.apache.org/jira/browse/CASSANDRA-13204] that the thread waiting forever doing nothing. This one happened during "nodetool repair -pr -seq -j 1" in production but I can easily simulate the problem with just "nodetool repair" in dev environment (CCM). I'm trying to explain what happened with 3.0.13 code base. 1. One node is down while doing repair. This is the error I saw in production: {code} ERROR [GossipTasks:1] 2017-05-19 15:00:10,545 RepairSession.java:334 - [repair #bc9a3cd1-3ca3-11e7-a44a-e30923ac9336] session completed with the following error java.io.IOException: Endpoint /10.185.43.15 died at org.apache.cassandra.repair.RepairSession.convict(RepairSession.java:333) ~[apache-cassandra-3.0.11.jar:3.0.11] at org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) [apache-cassandra-3.0.11.jar:3.0.11] at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:766) [apache-cassandra-3.0.11.jar:3.0.11] at org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:66) [apache-cassandra-3.0.11.jar:3.0.11] at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:181) [apache-cassandra-3.0.11.jar:3.0.11] at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) [apache-cassandra-3.0.11.jar:3.0.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.0.11.jar:3.0.11] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] {code} 2. At this moment the repair coordinator hasn't received the response (MerkleTrees) for the node that was marked down. This means, RepairJob#run will never return because it waits for validations to finish: {code} // Wait for validation to complete Futures.getUnchecked(validations); {code} Be noted that all RepairJob's (as Runnable) run on a shared executor created in RepairRunnable#runMayThrow, while all snapshot, validation and sync'ing happen on a per-RepairSession "taskExecutor". The RepairJob#run will only return when it receives MerkleTrees (or null) from all endpoints for a given column family and token range. As evidence of the thread leak, below is from the thread dump. I can also get the same stack trace when simulating the same issue in dev environment. {code} "Repair#129:56" #406373 daemon prio=5 os_prio=0 tid=0x7fc495028400 nid=0x1a77d waiting on condition [0x7fc02153] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0002d7c00198> (a com.google.common.util.concurrent.AbstractFuture$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137) at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509) at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(Named
[jira] [Comment Edited] (CASSANDRA-13049) Too many open files during bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810 ] Simon Zhou edited comment on CASSANDRA-13049 at 5/12/17 10:19 PM: -- I wrote some [micro benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java]. To my surprise memory mapping is very efficient even on small files. Here is the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. Notes: 1. I should have disabled page cache but this result is for the first time I ran the test on that server. 2. The buffer size is 64KB. I tried 4KB and it shows similar result but 4KB is not an optimal buffer size. Having said that, we can stick with mmap for efficient IO while seeking for configuration tuning to reduce the number of sstables being streamed. Benchmark (bufferSize) (filePath) (useDirectBuffer) Mode Cnt Score Error Units MmapPerf.readChannel 65536/home/szhou/1kfiles false avgt4 0.044 ± 0.051 s/op MmapPerf.readChannel 65536/home/szhou/1kfiles true avgt4 0.064 ± 0.015 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles false avgt4 0.050 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles true avgt4 0.072 ± 0.019 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles false avgt4 0.143 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles true avgt4 0.166 ± 0.021 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles false avgt4 1.051 ± 0.801 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles true avgt4 1.287 ± 0.220 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles false avgt4 9.696 ± 2.207 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles true avgt4 13.754 ± 1.379 s/op MmapPerf.readMapping 65536/home/szhou/1kfiles false avgt4 0.017 ± 0.007 s/op MmapPerf.readMapping 65536/home/szhou/1kfiles true avgt4 0.017 ± 0.005 s/op MmapPerf.readMapping 65536 /home/szhou/10kfiles false avgt4 0.016 ± 0.004 s/op MmapPerf.readMapping 65536 /home/szhou/10kfiles true avgt4 0.017 ± 0.006 s/op MmapPerf.readMapping 65536 /home/szhou/100kfiles false avgt4 0.023 ± 0.004 s/op MmapPerf.readMapping 65536 /home/szhou/100kfiles true avgt4 0.026 ± 0.006 s/op MmapPerf.readMapping 65536/home/szhou/1mfiles false avgt4 0.129 ± 0.017 s/op MmapPerf.readMapping 65536/home/szhou/1mfiles true avgt4 0.132 ± 0.068 s/op MmapPerf.readMapping 65536 /home/szhou/10mfiles false avgt4 1.313 ± 0.262 s/op MmapPerf.readMapping 65536 /home/szhou/10mfiles true avgt4 1.274 ± 0.482 s/op was (Author: szhou): I wrote some [micro benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java]. To my surprise memory mapping is very efficient even on small files. Here is the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. I should have disabled page cache but this result is for the first time I ran the test on that server. Having said that, we can stick with mmap for efficient IO while seeking for configuration tuning to reduce the number of sstables being streamed. Benchmark (bufferSize) (filePath) (useDirectBuffer) Mode Cnt Score Error Units MmapPerf.readChannel 65536/home/szhou/1kfiles false avgt4 0.044 ± 0.051 s/op MmapPerf.readChannel 65536/home/szhou/1kfiles true avgt4 0.064 ± 0.015 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles false avgt4 0.050 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles true avgt4 0.072 ± 0.019 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles false avgt4 0.143 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles true avgt4 0.166 ± 0.021 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles false avgt4 1.051 ± 0.801 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles true avgt4 1.287 ± 0.220 s/op MmapPerf.readChannel 65536 /home/szhou/10mf
[jira] [Comment Edited] (CASSANDRA-13049) Too many open files during bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810 ] Simon Zhou edited comment on CASSANDRA-13049 at 5/12/17 10:17 PM: -- I wrote some [micro benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java]. To my surprise memory mapping is very efficient even on small files. Here is the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. I should have disabled page cache but this result is for the first time I ran the test on that server. Having said that, we can stick with mmap for efficient IO while seeking for configuration tuning to reduce the number of sstables being streamed. Benchmark (bufferSize) (filePath) (useDirectBuffer) Mode Cnt Score Error Units MmapPerf.readChannel 65536/home/szhou/1kfiles false avgt4 0.044 ± 0.051 s/op MmapPerf.readChannel 65536/home/szhou/1kfiles true avgt4 0.064 ± 0.015 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles false avgt4 0.050 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles true avgt4 0.072 ± 0.019 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles false avgt4 0.143 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles true avgt4 0.166 ± 0.021 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles false avgt4 1.051 ± 0.801 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles true avgt4 1.287 ± 0.220 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles false avgt4 9.696 ± 2.207 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles true avgt4 13.754 ± 1.379 s/op MmapPerf.readMapping 65536/home/szhou/1kfiles false avgt4 0.017 ± 0.007 s/op MmapPerf.readMapping 65536/home/szhou/1kfiles true avgt4 0.017 ± 0.005 s/op MmapPerf.readMapping 65536 /home/szhou/10kfiles false avgt4 0.016 ± 0.004 s/op MmapPerf.readMapping 65536 /home/szhou/10kfiles true avgt4 0.017 ± 0.006 s/op MmapPerf.readMapping 65536 /home/szhou/100kfiles false avgt4 0.023 ± 0.004 s/op MmapPerf.readMapping 65536 /home/szhou/100kfiles true avgt4 0.026 ± 0.006 s/op MmapPerf.readMapping 65536/home/szhou/1mfiles false avgt4 0.129 ± 0.017 s/op MmapPerf.readMapping 65536/home/szhou/1mfiles true avgt4 0.132 ± 0.068 s/op MmapPerf.readMapping 65536 /home/szhou/10mfiles false avgt4 1.313 ± 0.262 s/op MmapPerf.readMapping 65536 /home/szhou/10mfiles true avgt4 1.274 ± 0.482 s/op was (Author: szhou): I wrote some micro [benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java]. To my surprise memory mapping is very efficient even on small files. Here is the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. I should have disabled page cache but this result is for the first time I ran the test on that server. Having said that, we can stick with mmap for efficient IO while seeking for configuration tuning to reduce the number of sstables being streamed. Benchmark (bufferSize) (filePath) (useDirectBuffer) Mode Cnt Score Error Units MmapPerf.readChannel 65536/home/szhou/1kfiles false avgt4 0.044 ± 0.051 s/op MmapPerf.readChannel 65536/home/szhou/1kfiles true avgt4 0.064 ± 0.015 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles false avgt4 0.050 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles true avgt4 0.072 ± 0.019 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles false avgt4 0.143 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles true avgt4 0.166 ± 0.021 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles false avgt4 1.051 ± 0.801 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles true avgt4 1.287 ± 0.220 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles false avgt4 9.696 ± 2.207 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles
[jira] [Commented] (CASSANDRA-13049) Too many open files during bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810 ] Simon Zhou commented on CASSANDRA-13049: I wrote some micro [benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java]. To my surprise memory mapping is very efficient even on small files. Here is the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. I should have disabled page cache but this result is for the first time I ran the test on that server. Having said that, we can stick with mmap for efficient IO while seeking for configuration tuning to reduce the number of sstables being streamed. Benchmark (bufferSize) (filePath) (useDirectBuffer) Mode Cnt Score Error Units MmapPerf.readChannel 65536/home/szhou/1kfiles false avgt4 0.044 ± 0.051 s/op MmapPerf.readChannel 65536/home/szhou/1kfiles true avgt4 0.064 ± 0.015 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles false avgt4 0.050 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/10kfiles true avgt4 0.072 ± 0.019 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles false avgt4 0.143 ± 0.060 s/op MmapPerf.readChannel 65536 /home/szhou/100kfiles true avgt4 0.166 ± 0.021 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles false avgt4 1.051 ± 0.801 s/op MmapPerf.readChannel 65536/home/szhou/1mfiles true avgt4 1.287 ± 0.220 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles false avgt4 9.696 ± 2.207 s/op MmapPerf.readChannel 65536 /home/szhou/10mfiles true avgt4 13.754 ± 1.379 s/op MmapPerf.readMapping 65536/home/szhou/1kfiles false avgt4 0.017 ± 0.007 s/op MmapPerf.readMapping 65536/home/szhou/1kfiles true avgt4 0.017 ± 0.005 s/op MmapPerf.readMapping 65536 /home/szhou/10kfiles false avgt4 0.016 ± 0.004 s/op MmapPerf.readMapping 65536 /home/szhou/10kfiles true avgt4 0.017 ± 0.006 s/op MmapPerf.readMapping 65536 /home/szhou/100kfiles false avgt4 0.023 ± 0.004 s/op MmapPerf.readMapping 65536 /home/szhou/100kfiles true avgt4 0.026 ± 0.006 s/op MmapPerf.readMapping 65536/home/szhou/1mfiles false avgt4 0.129 ± 0.017 s/op MmapPerf.readMapping 65536/home/szhou/1mfiles true avgt4 0.132 ± 0.068 s/op MmapPerf.readMapping 65536 /home/szhou/10mfiles false avgt4 1.313 ± 0.262 s/op MmapPerf.readMapping 65536 /home/szhou/10mfiles true avgt4 1.274 ± 0.482 s/op > Too many open files during bootstrapping > > > Key: CASSANDRA-13049 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13049 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > > We just upgraded from 2.2.5 to 3.0.10 and got issue during bootstrapping. So > likely this is something made worse along with improving IO performance in > Cassandra 3. > On our side, the issue is that we have lots of small sstables and thus when > bootstrapping a new node, it receives lots of files during streaming and > Cassandra keeps all of them open for an unpredictable amount of time. > Eventually we hit "Too many open files" error and around that time, I can see > ~1M open files through lsof and almost all of them are *-Data.db and > *-Index.db. Definitely we should use a better compaction strategy to reduce > the number of sstables but I see a few possible improvements in Cassandra: > 1. We use memory map when reading data from sstables. Every time we create a > new memory map, there is one more file descriptor open. Memory map improves > IO performance when dealing with large files, do we want to set a file size > threshold when doing this? > 2. Whenever we finished receiving a file from peer, we create a > SSTableReader/BigTableReader, which includes opening the data file and index > file, and keep them open until some time later (unpredictable). See > StreamReceiveTask#L110, BigTableWriter#openFinal and > SSTableReader#InstanceTidier. Is it better to lazily open the data/index > files or close them more often to reclaim the file descriptors? > I searched all known issue in JIRA and looks like this is a new issue in > Cassa
[jira] [Created] (CASSANDRA-13491) Emit metrics for JVM safepoint pause
Simon Zhou created CASSANDRA-13491: -- Summary: Emit metrics for JVM safepoint pause Key: CASSANDRA-13491 URL: https://issues.apache.org/jira/browse/CASSANDRA-13491 Project: Cassandra Issue Type: New Feature Reporter: Simon Zhou GC pause is not the only source of latency from JVM. In one of our recent production issues, the metrics for GC looks good (some >200ms and longest 500ms) but GC logs show periodic pauses like this: {code} 2017-04-26T01:51:29.420+: 352535.998: Total time for which application threads were stopped: 19.8835870 seconds, Stopping threads took: 19.7842073 seconds {code} This huge delay should be JVM malfunction but it caused some requests timeout. So I'm suggesting to add support for safepoint pause for better observability. Two problems though: 1. This depends on JVM. Some JVMs may not expose these internal MBeans. This is actually the same case for existing GCInspector. 2. For Hotspot, it has HotspotRuntime as an internal MBean so that we can get safepoint pause. However, there is no notification support for that. I got error "MBean sun.management:type=HotspotRuntime does not implement javax.management.NotificationBroadcaster" when trying to register a listener. This means we will need to pull the safepoint pauses from HotspotRuntime periodically. Reference: http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html Anyone think we should support this? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985756#comment-15985756 ] Simon Zhou commented on CASSANDRA-13387: Thanks for the comments. As [~bdeggleston] suggested, I'll probably create a ticket after investigation. Here are the updated patches: |3.0.x |[patch | https://github.com/szhou1234/cassandra/commit/56c9950b29d233e71bb6a5e2e1d1f3f714c3d723]| |3.11 |[patch | https://github.com/szhou1234/cassandra/commit/a136176e9798ceaa7efb6b062acf60f51786f4d1]| |4.0 |[patch | https://github.com/szhou1234/cassandra/commit/54803838709308f364d4abd50ba995cd9caa61f4]| Be noted that 4.0 patch is slightly different from 3.0 due to merge conflict. > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983837#comment-15983837 ] Simon Zhou commented on CASSANDRA-13397: [~pauloricardomg], in case you haven't done so, are you going to merge the fix to trunk? > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > Attachments: CASSANDRA-13397-v1.patch > > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13387: --- Status: Patch Available (was: Open) > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983831#comment-15983831 ] Simon Zhou commented on CASSANDRA-13387: Patch for 3.0.x is [ here | https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc]. Exception metrics are emitted on keyspace level (RepairRunnable). We could emit them on a finer granularity but that means more exceptions, especially for primary range repair. For monitoring purpose, I think keyspace level metrics are enough but let me know if you have different opinion. Once initial review passes, I'll work on a patch for trunk. > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13387) Metrics for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983831#comment-15983831 ] Simon Zhou edited comment on CASSANDRA-13387 at 4/25/17 11:33 PM: -- Patch for 3.0.x is [here | https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc]. Exception metrics are emitted on keyspace level (RepairRunnable). We could emit them on a finer granularity but that means more exceptions, especially for primary range repair. For monitoring purpose, I think keyspace level metrics are enough but let me know if you have different opinion. Once initial review passes, I'll work on a patch for trunk. was (Author: szhou): Patch for 3.0.x is [ here | https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc]. Exception metrics are emitted on keyspace level (RepairRunnable). We could emit them on a finer granularity but that means more exceptions, especially for primary range repair. For monitoring purpose, I think keyspace level metrics are enough but let me know if you have different opinion. Once initial review passes, I'll work on a patch for trunk. > Metrics for repair > -- > > Key: CASSANDRA-13387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > > We're missing metrics for repair, especially for errors. From what I observed > now, the exception will be caught by UncaughtExceptionHandler set in > CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one > example: > {code} > ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: Parent repair session with id = > 8c85d260-1319-11e7-82a2-25090a89015f has failed. > at > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_121] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983423#comment-15983423 ] Simon Zhou commented on CASSANDRA-13467: Thanks [~slebresne]. If there is workaround for both behaviors below at the same time, I'd not need this backport: 1. Cassandra shouldn't log warning or throw exception for batched statements on the same partition. 2. Cassandra should still log warning or throw exception for batched statements on different partitions. If you just bump the thresholds in cassandra.yaml, you will be unable to detect the problem in #2 above. Do I misunderstand? > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979516#comment-15979516 ] Simon Zhou commented on CASSANDRA-13467: [~slebresne] do you want to take a look at this backport? The code is from your original commit in 3.6. > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356 ] Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 10:06 PM: -- This is the same patch from [3.6 | https://github.com/pcmanus/cassandra/commit/284eb4f49fade13f8dfcec9ff0f33aa19963c788] with slight change: | 3.0 | [patch|https://github.com/szhou1234/cassandra/commit/2c61388e3032a18e32adbfdc30ab92908aef] was (Author: szhou): This is the same patch from [3.6 | https://github.com/pcmanus/cassandra/commit/284eb4f49fade13f8dfcec9ff0f33aa19963c788] with slight change: | 3.0 | [patch|https://github.com/szhou1234/cassandra/commit/b5783abc294564cb1d248f5ceee62a2924113060] > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13467: --- Status: Patch Available (was: Open) > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356 ] Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 10:03 PM: -- This is the same patch from [3.6 | https://github.com/pcmanus/cassandra/commit/284eb4f49fade13f8dfcec9ff0f33aa19963c788] with slight change: | 3.0 | [patch|https://github.com/szhou1234/cassandra/commit/b5783abc294564cb1d248f5ceee62a2924113060] was (Author: szhou): This is the same patch from CASSANDRA-10876: | 3.0 | [patch|https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269] > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356 ] Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 8:57 PM: - This is the same patch from CASSANDRA-10876: | 3.0 | [patch|https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269] was (Author: szhou): This is the same patch from CASSANDRA-10876: | 3.0 | patch [https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269 ] | > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356 ] Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 8:55 PM: - This is the same patch from CASSANDRA-10876: | 3.0 | patch [https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269 ] | was (Author: szhou): This is the same patch from CASSANDRA-10876: | 3.0 | [ patch | https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269 ] | > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
[ https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356 ] Simon Zhou commented on CASSANDRA-13467: This is the same patch from CASSANDRA-10876: | 3.0 | [ patch | https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269 ] | > [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single > partition batches > - > > Key: CASSANDRA-13467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > > Would anyone think this backport may cause problem? We're running Cassandra > 3.0 and hit this problem. There are some other people would like this > backport (see the last few comments from CASSANDRA-10876). > I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches
Simon Zhou created CASSANDRA-13467: -- Summary: [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches Key: CASSANDRA-13467 URL: https://issues.apache.org/jira/browse/CASSANDRA-13467 Project: Cassandra Issue Type: Bug Components: Core Reporter: Simon Zhou Assignee: Simon Zhou Priority: Minor Fix For: 3.0.x Would anyone think this backport may cause problem? We're running Cassandra 3.0 and hit this problem. There are some other people would like this backport (see the last few comments from CASSANDRA-10876). I'll provide the patch soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977929#comment-15977929 ] Simon Zhou commented on CASSANDRA-13397: Thank you [~pauloricardomg]! > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > Attachments: CASSANDRA-13397-v1.patch > > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13397: --- Priority: Minor (was: Major) Fix Version/s: 3.0.13 > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.13 > > Attachments: CASSANDRA-13397-v1.patch > > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13397: --- Attachment: CASSANDRA-13397-v1.patch The attached patch includes the fix and a minor improvement (bail out early if there is any unavailable neighbor). [~krummas] could you help review this patch? > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Attachments: CASSANDRA-13397-v1.patch > > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13397: --- Status: Patch Available (was: Open) > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Attachments: CASSANDRA-13397-v1.patch > > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13397: --- Description: While looking into repair code, I realize that we should check return value of CountDownLatch.await(). Most of the places that we don't check the return value, nothing bad would happen due to other protection. However, ActiveRepairService#prepareForRepair should have the check. Code to reproduce: {code} public static void testLatch() throws InterruptedException { CountDownLatch latch = new CountDownLatch(2); latch.countDown(); new Thread(() -> { try { Thread.sleep(1200); } catch (InterruptedException e) { System.err.println("interrupted"); } latch.countDown(); System.out.println("counted down"); }).start(); latch.await(1, TimeUnit.SECONDS); if (latch.getCount() > 0) { System.err.println("failed"); } else { System.out.println("success"); } } {code} was: While looking into repair code, I realize that we should check return value of CountDownLatch.await(). However, there are some places we don't check and some of them may cause bad consequent behavior, like in ActiveRepairService#prepareForRepair and StorageProxy#describeSchemaVersions. I haven't checked the original version that has this bug but at least StorageProxy#describeSchemaVersions has the bug starting from 2010. Code to reproduce: {code} public static void testLatch() throws InterruptedException { CountDownLatch latch = new CountDownLatch(2); latch.countDown(); new Thread(() -> { try { Thread.sleep(1200); } catch (InterruptedException e) { System.err.println("interrupted"); } latch.countDown(); System.out.println("counted down"); }).start(); latch.await(1, TimeUnit.SECONDS); if (latch.getCount() > 0) { System.err.println("failed"); } else { System.out.println("success"); } } {code} > Return value of CountDownLatch.await() not being checked > > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13397: --- Summary: Return value of CountDownLatch.await() not being checked in Repair (was: Return value of CountDownLatch.await() not being checked) > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked
Simon Zhou created CASSANDRA-13397: -- Summary: Return value of CountDownLatch.await() not being checked Key: CASSANDRA-13397 URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou While looking into repair code, I realize that we should check return value of CountDownLatch.await(). However, there are some places we don't check and some of them may cause bad consequent behavior, like in ActiveRepairService#prepareForRepair and StorageProxy#describeSchemaVersions. I haven't checked the original version that has this bug but at least StorageProxy#describeSchemaVersions has the bug starting from 2010. Code to reproduce: {code} public static void testLatch() throws InterruptedException { CountDownLatch latch = new CountDownLatch(2); latch.countDown(); new Thread(() -> { try { Thread.sleep(1200); } catch (InterruptedException e) { System.err.println("interrupted"); } latch.countDown(); System.out.println("counted down"); }).start(); latch.await(1, TimeUnit.SECONDS); if (latch.getCount() > 0) { System.err.println("failed"); } else { System.out.println("success"); } } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13387) Metrics for repair
Simon Zhou created CASSANDRA-13387: -- Summary: Metrics for repair Key: CASSANDRA-13387 URL: https://issues.apache.org/jira/browse/CASSANDRA-13387 Project: Cassandra Issue Type: Improvement Reporter: Simon Zhou Assignee: Simon Zhou Priority: Minor We're missing metrics for repair, especially for errors. From what I observed now, the exception will be caught by UncaughtExceptionHandler set in CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one example: {code} ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - Exception in thread Thread[AntiEntropyStage:1,5,main] java.lang.RuntimeException: Parent repair session with id = 8c85d260-1319-11e7-82a2-25090a89015f has failed. at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.0.10.jar:3.0.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13358) AlterViewStatement.checkAccess can throw exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942480#comment-15942480 ] Simon Zhou commented on CASSANDRA-13358: I didn't see that you have the "if" check as you intended to. Also the coding style in Cassandra is that you have a new line for "catch **". > AlterViewStatement.checkAccess can throw exceptions > --- > > Key: CASSANDRA-13358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13358 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Hao Zhong >Assignee: Hao Zhong > Attachments: cassandra.patch > > > The AlterViewStatement.checkAccess method has code lines as follow: > {code:title=AlterViewStatement.java|borderStyle=solid} > if (baseTable != null) > state.hasColumnFamilyAccess(keyspace(), baseTable.name, > Permission.ALTER); > {code} > These lines can throw InvalidRequestException. Indeed, > DropTableStatement.checkAccess has a similar problem, and was fixed in > CASSANDRA-6687. The fixed code is as follow: > {code:title=DropTableStatement.java|borderStyle=solid} > try > { > state.hasColumnFamilyAccess(keyspace(), columnFamily(), > Permission.DROP); > } > catch (InvalidRequestException e) > { > if (!ifExists) > throw e; > } > {code} > Please fix the problem as CASSANDRA-6687 did. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13343) Wrong class name for LoggerFactory.getLogger
[ https://issues.apache.org/jira/browse/CASSANDRA-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13343: --- Status: Patch Available (was: Open) [~pauloricardomg] do you mind taking a quick review? Just one line change. > Wrong class name for LoggerFactory.getLogger > > > Key: CASSANDRA-13343 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13343 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Trivial > Fix For: 3.0.13 > > Attachments: CASSANDRA-13343-v1.patch > > > We have the below code in AnticompactionTask.java. The parameter is wrong. > {code} > private static Logger logger = LoggerFactory.getLogger(RepairSession.class); > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13343) Wrong class name for LoggerFactory.getLogger
[ https://issues.apache.org/jira/browse/CASSANDRA-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13343: --- Attachment: CASSANDRA-13343-v1.patch > Wrong class name for LoggerFactory.getLogger > > > Key: CASSANDRA-13343 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13343 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Trivial > Fix For: 3.0.13 > > Attachments: CASSANDRA-13343-v1.patch > > > We have the below code in AnticompactionTask.java. The parameter is wrong. > {code} > private static Logger logger = LoggerFactory.getLogger(RepairSession.class); > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13343) Wrong class name for LoggerFactory.getLogger
Simon Zhou created CASSANDRA-13343: -- Summary: Wrong class name for LoggerFactory.getLogger Key: CASSANDRA-13343 URL: https://issues.apache.org/jira/browse/CASSANDRA-13343 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou Priority: Trivial Fix For: 3.0.13 We have the below code in AnticompactionTask.java. The parameter is wrong. {code} private static Logger logger = LoggerFactory.getLogger(RepairSession.class); {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
[ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923619#comment-15923619 ] Simon Zhou commented on CASSANDRA-13323: Thanks [~slebresne] for the comment. For hinted handoff of a dropped table, the UnknownColumnFamilyException has been handled in HintMessage#Serializer#deserialize. Even though a HintMessage will still be returned, its internal data (Hint) is null and thus will be ignored in HintVerbHanlder. So UnknownColumnFamilyException just causes some overhead (deserialization, etc.) on the receiver side of hinted handoff. At this moment I tend to say hinted handoff is unrelated to IncomingTcpConnection being closed but I'll double check. The stack trace I posted in this ticket is actually for paxos commit. Unfortunately CommitSerializer doesn't take the message size into consideration. So I cannot just catch UnknownColumnFamilyException and skip some bytes from DataOutputPlus. To fix that, we will have to update the protocol a bit (maybe introduce MessagingService.VERSION_3xx). Do you think it worths the effort? I've lost the original logs so I cannot confirm the scope of this issue. One of the cons of binary protocol is that it's hard to maintain backward compatibility. > IncomingTcpConnection closed due to one bad message > --- > > Key: CASSANDRA-13323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13323 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.13 > > Attachments: CASSANDRA-13323-v1.patch > > > We got this exception: > {code} > WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 > IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from > socket; closing > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this > is likely due to the schema not being fully propagated. Please wait for > schema agreement on table creation. > at > org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.10.jar:3.0.10] > {code} > Also we saw this log in another host indicating it needs to re-connect: > {code} > INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 > OutboundTcpConnection.java:515 - Handshaking version with / > {code} > The reason is that the node was receiving hinted data for a dropped table. > This may happen with other messages as well. On Cassandra side, > IncomingTcpConnection shouldn't close on just one bad message, even though it > will be restarted soon later by SocketThread in MessagingService. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
[ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13323: --- Description: We got this exception: {code} WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation. at org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) ~[apache-cassandra-3.0.10.jar:3.0.10] {code} Also we saw this log in another host indicating it needs to re-connect: {code} INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 - Handshaking version with / {code} The reason is that the node was receiving hinted data for a dropped table. This may happen with other messages as well. On Cassandra side, IncomingTcpConnection shouldn't close on just one bad message, even though it will be restarted soon later by SocketThread in MessagingService. was: We got this exception: {code} WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation. at org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) ~[apache-cassandra-3.0.10.jar:3.0.10] {code} Also we saw this log in another host indicating it needs to re-connect: {code} INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 - Handshaking version with / {code} The reason is that another node was sending hinted data to this node. However the hinted data was for a table that had been dropped. This may happen with other messages as well. On Cassandra side, IncomingTcpConnection shouldn't close on just one bad message, even though it will be restarted soon later by SocketThread in MessagingService. > IncomingTcpConnection closed due to one bad message > --- > > Key: CASSANDRA-13323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13323 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >
[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
[ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13323: --- Status: Patch Available (was: Open) > IncomingTcpConnection closed due to one bad message > --- > > Key: CASSANDRA-13323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13323 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.13 > > Attachments: CASSANDRA-13323-v1.patch > > > We got this exception: > {code} > WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 > IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from > socket; closing > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this > is likely due to the schema not being fully propagated. Please wait for > schema agreement on table creation. > at > org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.10.jar:3.0.10] > {code} > Also we saw this log in another host indicating it needs to re-connect: > {code} > INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 > OutboundTcpConnection.java:515 - Handshaking version with / > {code} > The reason is that another node was sending hinted data to this node. However > the hinted data was for a table that had been dropped. This may happen with > other messages as well. On Cassandra side, IncomingTcpConnection shouldn't > close on just one bad message, even though it will be restarted soon later by > SocketThread in MessagingService. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
[ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13323: --- Attachment: CASSANDRA-13323-v1.patch > IncomingTcpConnection closed due to one bad message > --- > > Key: CASSANDRA-13323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13323 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.13 > > Attachments: CASSANDRA-13323-v1.patch > > > We got this exception: > {code} > WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 > IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from > socket; closing > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this > is likely due to the schema not being fully propagated. Please wait for > schema agreement on table creation. > at > org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.10.jar:3.0.10] > {code} > Also we saw this log in another host indicating it needs to re-connect: > {code} > INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 > OutboundTcpConnection.java:515 - Handshaking version with / > {code} > The reason is that another node was sending hinted data to this node. However > the hinted data was for a table that had been dropped. This may happen with > other messages as well. On Cassandra side, IncomingTcpConnection shouldn't > close on just one bad message, even though it will be restarted soon later by > SocketThread in MessagingService. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
Simon Zhou created CASSANDRA-13323: -- Summary: IncomingTcpConnection closed due to one bad message Key: CASSANDRA-13323 URL: https://issues.apache.org/jira/browse/CASSANDRA-13323 Project: Cassandra Issue Type: Bug Reporter: Simon Zhou Assignee: Simon Zhou Fix For: 3.0.13 We got this exception: {code} WARN [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation. at org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) ~[apache-cassandra-3.0.10.jar:3.0.10] {code} Also we saw this log in another host indicating it needs to re-connect: {code} INFO [HANDSHAKE-/] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 - Handshaking version with / {code} The reason is that another node was sending hinted data to this node. However the hinted data was for a table that had been dropped. This may happen with other messages as well. On Cassandra side, IncomingTcpConnection shouldn't close on just one bad message, even though it will be restarted soon later by SocketThread in MessagingService. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded
[ https://issues.apache.org/jira/browse/CASSANDRA-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13261: --- Status: Patch Available (was: Open) > Improve speculative retry to avoid being overloaded > --- > > Key: CASSANDRA-13261 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13261 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > Attachments: CASSANDRA-13261-v1.patch > > > In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as > an improvement. > This is to avoid Cassandra being overloaded when using CUSTOM speculative > retry parameter. Steps to reason/repro this with 3.0.10: > 1. Use custom speculative retry threshold like this: > cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms'; > 2. SpeculatingReadExecutor will be used, according to this piece of code in > AbstractReadExecutor: > {code} > if (retry.equals(SpeculativeRetryParam.ALWAYS)) > return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, > consistencyLevel, targetReplicas); > else // PERCENTILE or CUSTOM. > return new SpeculatingReadExecutor(keyspace, cfs, command, > consistencyLevel, targetReplicas); > {code} > 3. When RF=3 and LOCAL_QUORUM is used, the below code (from > SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect > Cassandra from being overloaded, even though the inline comment suggests such > intention: > {code} > // no latency information, or we're overloaded > if (cfs.sampleLatencyNanos > > TimeUnit.MILLISECONDS.toNanos(command.getTimeout())) > return; > {code} > The reason is that cfs.sampleLatencyNanos is assigned as > retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of > ColumnFamilyStore. However pretty often the timeout is the default one 5000ms. > As the name suggests, sampleLatencyNanos should be used to keep sampled > latency, not something configured "statically". My proposal: > a. Introduce option -Dcassandra.overload.threshold to allow customizing > overload threshold. The default threshold would be > DatabaseDescriptor.getRangeRpcTimeout(). > b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload > detection, we just compare cfs.sampleLatencyNanos with the customizable > threshold above. > c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time > before retry (see line 282 of AbstractReadExecutor). This is the value from > table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded
[ https://issues.apache.org/jira/browse/CASSANDRA-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13261: --- Attachment: CASSANDRA-13261-v1.patch I'm not sure what's the next release for 3.0.* and 3.0.11 was just merged to trunk. The attached patch is for trunk but I'd like to have this improvement included in the next release for 3.0.*. [~tjake] Maybe you can help review this patch since you have some context from CASSANDRA-13009? > Improve speculative retry to avoid being overloaded > --- > > Key: CASSANDRA-13261 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13261 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou > Attachments: CASSANDRA-13261-v1.patch > > > In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as > an improvement. > This is to avoid Cassandra being overloaded when using CUSTOM speculative > retry parameter. Steps to reason/repro this with 3.0.10: > 1. Use custom speculative retry threshold like this: > cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms'; > 2. SpeculatingReadExecutor will be used, according to this piece of code in > AbstractReadExecutor: > {code} > if (retry.equals(SpeculativeRetryParam.ALWAYS)) > return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, > consistencyLevel, targetReplicas); > else // PERCENTILE or CUSTOM. > return new SpeculatingReadExecutor(keyspace, cfs, command, > consistencyLevel, targetReplicas); > {code} > 3. When RF=3 and LOCAL_QUORUM is used, the below code (from > SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect > Cassandra from being overloaded, even though the inline comment suggests such > intention: > {code} > // no latency information, or we're overloaded > if (cfs.sampleLatencyNanos > > TimeUnit.MILLISECONDS.toNanos(command.getTimeout())) > return; > {code} > The reason is that cfs.sampleLatencyNanos is assigned as > retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of > ColumnFamilyStore. However pretty often the timeout is the default one 5000ms. > As the name suggests, sampleLatencyNanos should be used to keep sampled > latency, not something configured "statically". My proposal: > a. Introduce option -Dcassandra.overload.threshold to allow customizing > overload threshold. The default threshold would be > DatabaseDescriptor.getRangeRpcTimeout(). > b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload > detection, we just compare cfs.sampleLatencyNanos with the customizable > threshold above. > c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time > before retry (see line 282 of AbstractReadExecutor). This is the value from > table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded
Simon Zhou created CASSANDRA-13261: -- Summary: Improve speculative retry to avoid being overloaded Key: CASSANDRA-13261 URL: https://issues.apache.org/jira/browse/CASSANDRA-13261 Project: Cassandra Issue Type: Improvement Reporter: Simon Zhou Assignee: Simon Zhou In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as an improvement. This is to avoid Cassandra being overloaded when using CUSTOM speculative retry parameter. Steps to reason/repro this with 3.0.10: 1. Use custom speculative retry threshold like this: cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms'; 2. SpeculatingReadExecutor will be used, according to this piece of code in AbstractReadExecutor: {code} if (retry.equals(SpeculativeRetryParam.ALWAYS)) return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, consistencyLevel, targetReplicas); else // PERCENTILE or CUSTOM. return new SpeculatingReadExecutor(keyspace, cfs, command, consistencyLevel, targetReplicas); {code} 3. When RF=3 and LOCAL_QUORUM is used, the below code (from SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect Cassandra from being overloaded, even though the inline comment suggests such intention: {code} // no latency information, or we're overloaded if (cfs.sampleLatencyNanos > TimeUnit.MILLISECONDS.toNanos(command.getTimeout())) return; {code} The reason is that cfs.sampleLatencyNanos is assigned as retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of ColumnFamilyStore. However pretty often the timeout is the default one 5000ms. As the name suggests, sampleLatencyNanos should be used to keep sampled latency, not something configured "statically". My proposal: a. Introduce option -Dcassandra.overload.threshold to allow customizing overload threshold. The default threshold would be DatabaseDescriptor.getRangeRpcTimeout(). b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload detection, we just compare cfs.sampleLatencyNanos with the customizable threshold above. c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time before retry (see line 282 of AbstractReadExecutor). This is the value from table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13009) Speculative retry bugs
[ https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850545#comment-15850545 ] Simon Zhou edited comment on CASSANDRA-13009 at 2/2/17 9:31 PM: [~tjake], sure. I abstracted out the time unit fix as v2 patch. For the improvement, I'll create a separate ticket to track that. Thanks for the review! was (Author: szhou): [~tjake], sure. I abstracted out the time unit fix as v2 patch. For the improvement, should I create a separate ticket to track that? Not sure about the best practice here. Also thanks for the review! > Speculative retry bugs > -- > > Key: CASSANDRA-13009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13009 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.11 > > Attachments: CASSANDRA-13009-v1.patch, CASSANDRA-13009-v2.patch > > > There are a few issues with speculative retry: > 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10): > The left hand side is in nanos, as the name suggests, while the right hand > side is in millis. > {code} > sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2; > {code} > Here coordinatorReadLatency is already in nanos and we shouldn't multiple the > value by 1000. This was a regression in 8896a70 when we switch metrics > library and the two libraries use different time units. > {code} > sampleLatencyNanos = (long) > (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold()) > * 1000d); > {code} > 2. Confusing overload protection and retry delay. As the name > "sampleLatencyNanos" suggests, it should be used to keep the actually sampled > read latency. However, we assign it the retry threshold in the case of > CUSTOM. Then we compare the retry threshold with read timeout (defaults to > 5000ms). This means, if we use speculative_retry=10ms for the table, we won't > be able to avoid being overloaded. We should compare the actual read latency > with the read timeout for overload protection. See line 450 of > ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java. > My proposals are: > a. We use sampled p99 delay and compare it with a customizable threshold > (-Dcassandra.overload.threshold) for overload detection. > b. Introduce another variable retryDelayNanos for waiting time before retry. > This is the value from table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13009) Speculative retry bugs
[ https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13009: --- Attachment: CASSANDRA-13009-v2.patch [~tjake], sure. I abstracted out the time unit fix as v2 patch. For the improvement, should I create a separate ticket to track that? Not sure about the best practice here. Also thanks for the review! > Speculative retry bugs > -- > > Key: CASSANDRA-13009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13009 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.11 > > Attachments: CASSANDRA-13009-v1.patch, CASSANDRA-13009-v2.patch > > > There are a few issues with speculative retry: > 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10): > The left hand side is in nanos, as the name suggests, while the right hand > side is in millis. > {code} > sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2; > {code} > Here coordinatorReadLatency is already in nanos and we shouldn't multiple the > value by 1000. This was a regression in 8896a70 when we switch metrics > library and the two libraries use different time units. > {code} > sampleLatencyNanos = (long) > (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold()) > * 1000d); > {code} > 2. Confusing overload protection and retry delay. As the name > "sampleLatencyNanos" suggests, it should be used to keep the actually sampled > read latency. However, we assign it the retry threshold in the case of > CUSTOM. Then we compare the retry threshold with read timeout (defaults to > 5000ms). This means, if we use speculative_retry=10ms for the table, we won't > be able to avoid being overloaded. We should compare the actual read latency > with the read timeout for overload protection. See line 450 of > ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java. > My proposals are: > a. We use sampled p99 delay and compare it with a customizable threshold > (-Dcassandra.overload.threshold) for overload detection. > b. Introduce another variable retryDelayNanos for waiting time before retry. > This is the value from table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13009) Speculative retry bugs
[ https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832727#comment-15832727 ] Simon Zhou commented on CASSANDRA-13009: [~tjake], do you mind reviewing this patch? Thanks. > Speculative retry bugs > -- > > Key: CASSANDRA-13009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13009 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.11 > > Attachments: CASSANDRA-13009-v1.patch > > > There are a few issues with speculative retry: > 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10): > The left hand side is in nanos, as the name suggests, while the right hand > side is in millis. > {code} > sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2; > {code} > Here coordinatorReadLatency is already in nanos and we shouldn't multiple the > value by 1000. This was a regression in 8896a70 when we switch metrics > library and the two libraries use different time units. > {code} > sampleLatencyNanos = (long) > (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold()) > * 1000d); > {code} > 2. Confusing overload protection and retry delay. As the name > "sampleLatencyNanos" suggests, it should be used to keep the actually sampled > read latency. However, we assign it the retry threshold in the case of > CUSTOM. Then we compare the retry threshold with read timeout (defaults to > 5000ms). This means, if we use speculative_retry=10ms for the table, we won't > be able to avoid being overloaded. We should compare the actual read latency > with the read timeout for overload protection. See line 450 of > ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java. > My proposals are: > a. We use sampled p99 delay and compare it with a customizable threshold > (-Dcassandra.overload.threshold) for overload detection. > b. Introduce another variable retryDelayNanos for waiting time before retry. > This is the value from table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13009) Speculative retry bugs
[ https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13009: --- Reviewer: (was: T Jake Luciani) > Speculative retry bugs > -- > > Key: CASSANDRA-13009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13009 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou > Fix For: 3.0.11 > > Attachments: CASSANDRA-13009-v1.patch > > > There are a few issues with speculative retry: > 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10): > The left hand side is in nanos, as the name suggests, while the right hand > side is in millis. > {code} > sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2; > {code} > Here coordinatorReadLatency is already in nanos and we shouldn't multiple the > value by 1000. This was a regression in 8896a70 when we switch metrics > library and the two libraries use different time units. > {code} > sampleLatencyNanos = (long) > (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold()) > * 1000d); > {code} > 2. Confusing overload protection and retry delay. As the name > "sampleLatencyNanos" suggests, it should be used to keep the actually sampled > read latency. However, we assign it the retry threshold in the case of > CUSTOM. Then we compare the retry threshold with read timeout (defaults to > 5000ms). This means, if we use speculative_retry=10ms for the table, we won't > be able to avoid being overloaded. We should compare the actual read latency > with the read timeout for overload protection. See line 450 of > ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java. > My proposals are: > a. We use sampled p99 delay and compare it with a customizable threshold > (-Dcassandra.overload.threshold) for overload detection. > b. Introduce another variable retryDelayNanos for waiting time before retry. > This is the value from table setting (PERCENTILE or CUSTOM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou resolved CASSANDRA-13106. Resolution: Cannot Reproduce Hi Benedict, Sorry for the spam and thanks for the reply. I just thought this was a quick fix. Closing this ticket as I lost the original stack trace and thus cannot double confirm the CPU utilization issue was caused by Predicates.alwaysTrue(). > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: CASSANDRA-13106.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803106#comment-15803106 ] Simon Zhou commented on CASSANDRA-13106: Ah, got your point! Just want to fix the issue by opening this ticket. > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: CASSANDRA-13106.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803059#comment-15803059 ] Simon Zhou commented on CASSANDRA-13106: Thanks [~dbrosius]. Yes, I do have -ea flag but doesn't that only cause ~5% performance degradation? > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: CASSANDRA-13106.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802990#comment-15802990 ] Simon Zhou commented on CASSANDRA-13106: [~benedict], could you take a look at this one line patch? > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: CASSANDRA-13106.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13106: --- Attachment: CASSANDRA-13106.patch > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: CASSANDRA-13106.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13106: --- Attachment: (was: 0001-Remove-unnecessary-assertion.patch) > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: CASSANDRA-13106.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-13106) Unnecessary assert
Simon Zhou created CASSANDRA-13106: -- Summary: Unnecessary assert Key: CASSANDRA-13106 URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 Project: Cassandra Issue Type: Improvement Reporter: Simon Zhou Assignee: Simon Zhou Priority: Minor Attachments: 0001-Remove-unnecessary-assertion.patch We had over 70 thousand sstables and it's slow to bootstrap new node, even though the CPU utilization for main thread of Cassandra was nearly 100%. So we took a few stack traces and found that the main thread were busy running this line in Tracker.java: {code} assert Iterables.all(removed, remove); {code} Not exactly sure whether this line causes CPU utilization/bootstrapping issue, but this line is redundant because the Predict we pass in is Predicates.alwaysTrue(), which means the assertion always returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13106) Unnecessary assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Zhou updated CASSANDRA-13106: --- Attachment: 0001-Remove-unnecessary-assertion.patch > Unnecessary assertion > - > > Key: CASSANDRA-13106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13106 > Project: Cassandra > Issue Type: Improvement >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Attachments: 0001-Remove-unnecessary-assertion.patch > > > We had over 70 thousand sstables and it's slow to bootstrap new node, even > though the CPU utilization for main thread of Cassandra was nearly 100%. So > we took a few stack traces and found that the main thread were busy running > this line in Tracker.java: > {code} > assert Iterables.all(removed, remove); > {code} > Not exactly sure whether this line causes CPU utilization/bootstrapping > issue, but this line is redundant because the Predict we pass in is > Predicates.alwaysTrue(), which means the assertion always > returns true. So I propose to remove that line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)