[jira] [Commented] (CASSANDRA-13720) Clean up repair code

2021-08-21 Thread Simon Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402634#comment-17402634
 ] 

Simon Zhou commented on CASSANDRA-13720:


Thanks Ekaterina and Andres for the code review!

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13720) Clean up repair code

2021-08-11 Thread Simon Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397735#comment-17397735
 ] 

Simon Zhou commented on CASSANDRA-13720:


No other new changes. And yes, I squashed the commits. I should have created a 
separate one for easier review.

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13720) Clean up repair code

2021-08-11 Thread Simon Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397678#comment-17397678
 ] 

Simon Zhou commented on CASSANDRA-13720:


I've updated the PR to also remove _cmd_ in _createQueryThread_.

|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/604284c8cce620bf37e6290018a569d3ba53aee9]|

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13720) Clean up repair code

2021-08-01 Thread Simon Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13720:
---
Test and Documentation Plan:   (was: Most of the code in the original patch 
isn't relevant anymore. I've updated the patch based on the latest trunk.

|4.0 |[patch | https://github.com/apache/cassandra/pull/1126/commits]|)

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13720) Clean up repair code

2021-08-01 Thread Simon Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391229#comment-17391229
 ] 

Simon Zhou commented on CASSANDRA-13720:


Most of the code in the original patch isn't relevant anymore. I've updated the 
patch based on the latest trunk.

|4.0 |[patch | https://github.com/apache/cassandra/pull/1126/commits]|

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13720) Clean up repair code

2021-08-01 Thread Simon Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13720:
---
Test and Documentation Plan: 
Most of the code in the original patch isn't relevant anymore. I've updated the 
patch based on the latest trunk.

|4.0 |[patch | https://github.com/apache/cassandra/pull/1126/commits]|
 Status: Patch Available  (was: In Progress)

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13720) Clean up repair code

2021-07-02 Thread Simon Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373768#comment-17373768
 ] 

Simon Zhou commented on CASSANDRA-13720:


wow, it's a long time but right on time. I didn't work on Cassandra for the 
past 3 years but was just about to come back to this area. I'll take a look in 
the next few weeks and see if it still applies to 4.0.

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 4.x
>
>
> Lots of unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14382) Use timed wait from Futures in Guava

2018-04-12 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-14382:
--

 Summary: Use timed wait from Futures in Guava
 Key: CASSANDRA-14382
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14382
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou


We upgraded Guava to 23.3 in trunk and there is timed wait feature 
(Futures.withTimeout) that we should use. Otherwise we have a whole bunch of 
stability issues. Generally if something fails or is unresponsive, lots of 
thread will hang. For example, validation in repair.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch

2018-03-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408280#comment-16408280
 ] 

Simon Zhou commented on CASSANDRA-14252:


[~dikanggu] FYI I had a 
[fix|https://issues.apache.org/jira/browse/CASSANDRA-13261] for "overloading" 
issue long time before. Not sure if it's the issue that you had.


> Use zero as default score in DynamicEndpointSnitch
> --
>
> Key: CASSANDRA-14252
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
> Attachments: IMG_3180.jpg
>
>
> The problem I want to solve is that I found in our deployment, one slow but 
> alive data node can slow down the whole cluster, even caused timeout of our 
> requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the 
> DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node 
> latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score 
> from remote data node was not populated, so the fallback to 
> sortByProximityWithScore never happened. That's why a single slow data node, 
> can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a 
> chance to try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single 
> slow data node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple 
> times.
>   logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch

2018-03-01 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382756#comment-16382756
 ] 

Simon Zhou commented on CASSANDRA-14252:


Talked with [~dikanggu] offline. Previously I thought that timeout wouldn't be 
counted as part of latency score. Actually it is, so setting replica score as 0 
by default is less of a problem but only exposes a small vulnerability window:
Say if you have multiple replicas in a remote data center and you don't have 
score for one of them, thus it will be assigned score 0. This might cause 
traffic burst on this replica, for a short period of time and most of time it 
won't even be noticed.

This can be mitigated by assigning a larger score (such as maximum score of all 
the replicas) to the replica with null score. I'd defer this decision to 
[~dikanggu]. Otherwise the patch looks good to me.

> Use zero as default score in DynamicEndpointSnitch
> --
>
> Key: CASSANDRA-14252
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
>
> The problem I want to solve is that I found in our deployment, one slow but 
> alive data node can slow down the whole cluster, even caused timeout of our 
> requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the 
> DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node 
> latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score 
> from remote data node was not populated, so the fallback to 
> sortByProximityWithScore never happened. That's why a single slow data node, 
> can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a 
> chance to try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single 
> slow data node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple 
> times.
>   logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch

2018-02-28 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380949#comment-16380949
 ] 

Simon Zhou commented on CASSANDRA-14252:


I think I haven't made it clear enough or I misunderstand your fix. Let's say 
you want to use nodes in remote data center because of whatever issue with 
local datacenter, my understanding is that:
- Either we don't use a node from remote data center that doesn't have score 
yet, because the reason could be that it's totally unresponsive to previous 
read requests but still responds to for example, gossip message, thus this node 
hasn't been marked as down. (A node doesn't have score may also because it 
hasn't received read request from the remote coordinator node yet, or all the 
scores got reset after dynamicResetInterval, both of which are less of a 
problem)
- Or maybe we can use it but it should be picked with lower probability than 
other nodes in the same remote data center.

Now you assign a low score of 0 to a node that doesn't have score yet. This 
means it will be picked with higher probability. If that node truly has problem 
(unresponsive to read requests), then your fix will cause higher latency. 
Having said that, I don't mind setting the node score as the highest one among 
all node scores from the same data center.

> Use zero as default score in DynamicEndpointSnitch
> --
>
> Key: CASSANDRA-14252
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
>
> The problem I want to solve is that I found in our deployment, one slow but 
> alive data node can slow down the whole cluster, even caused timeout of our 
> requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the 
> DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node 
> latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score 
> from remote data node was not populated, so the fallback to 
> sortByProximityWithScore never happened. That's why a single slow data node, 
> can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a 
> chance to try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single 
> slow data node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple 
> times.
>   logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch

2018-02-26 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378100#comment-16378100
 ] 

Simon Zhou commented on CASSANDRA-14252:


For nodes in the same remote data center, if we don't have score for one node 
because there is no read response yet and we set an artificially low score 0 
for it, does it mean this node will be picked with higher probability than 
other nodes that have scores?

> Use zero as default score in DynamicEndpointSnitch
> --
>
> Key: CASSANDRA-14252
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
>
> The problem I want to solve is that I found in our deployment, one slow but 
> alive data node can slow down the whole cluster, even caused timeout of our 
> requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the 
> DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node 
> latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score 
> from remote data node was not populated, so the fallback to 
> sortByProximityWithScore never happened. That's why a single slow data node, 
> can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a 
> chance to try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single 
> slow data node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple 
> times.
>   logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch

2018-02-26 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377571#comment-16377571
 ] 

Simon Zhou commented on CASSANDRA-14252:


This is an interesting change but I'm not sure it fixes all problems.

The code that you changed was introduced in CASSANDRA-13074, which also claims 
to fix slow node issue, by totally ignoring nodes that we don't have a score, 
no matter it's a node in local or remote data center. Now with your fix, we 
still give these (remote) nodes a try by assigning an artificially low score. 
However, isn't 0 the lowest score that could result in these slow/unresponsive 
remote nodes being picked up before other remote nodes that have normal scores 
(such as 1.0)?

Btw, badness_threshold=0.1 may be too conservative. We also disabled IO factor 
when calculating the scores through 
-Dcassandra.ignore_dynamic_snitch_severity=true. See CASSANDRA-11738 for 
details.

> Use zero as default score in DynamicEndpointSnitch
> --
>
> Key: CASSANDRA-14252
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
>
> The problem I want to solve is that I found in our deployment, one slow but 
> alive data node can slow down the whole cluster, even caused timeout of our 
> requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the 
> DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node 
> latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score 
> from remote data node was not populated, so the fallback to 
> sortByProximityWithScore never happened. That's why a single slow data node, 
> can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a 
> chance to try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single 
> slow data node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple 
> times.
>   logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2018-02-15 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366468#comment-16366468
 ] 

Simon Zhou commented on CASSANDRA-14200:


Btw, I can reproduce the same issue with trunk and the same sstable. That 
indicates the issue is not fixed yet in trunk but we just need to figure out 
how we ended up with a null timestamp column.

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID

2018-02-15 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366466#comment-16366466
 ] 

Simon Zhou commented on CASSANDRA-14199:


Yes, confirmed. Can we do a backport?

> exception when dumping sstable with frozen collection of UUID
> -
>
> Key: CASSANDRA-14199
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14199
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> When dumping (sstabledump) sstable with frozen collection of UUID, there is 
> exception like this:
> {code:java}
> Exception in thread "main" org.apache.cassandra.serializers.MarshalException: 
> UUID should be 16 or 0 bytes (24)
> at 
> org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43)
> at 
> org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at 
> org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102)
> at 
> org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
>  
> *Steps to reproduce:*
> {code:java}
> cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 
> list, c2 frozen>, c3 set, c4 frozen>, c5 
> map, c6 frozen>);
> cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, 
> c6) VALUES ( 'id', [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], 
> [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], {'set', 'user'}, {'view', 'over'}, 
> {'good': 'hello', 'root': 'text'}, {'driver': 'java', 'note': 'new'});{code}
>  
> *Root cause:*
> Frozen collection is treated as simple column and it's the client's 
> responsibility to parse the data from ByteBuffer. We have this logic in 
> different drivers but sstabledump doesn't have the logic in place. It just 
> treat the whole collection as a single UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2018-02-15 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366460#comment-16366460
 ] 

Simon Zhou commented on CASSANDRA-14200:


[~cnlwsu] your steps won't reproduce the issue because column "ts" will be 
either unset, which won't even result an entry in sstabledump output, or a 
valid value. However I cannot reproduce it as well, even by setting "ts" as 
null explicitly, which actually results in a tombstone and a different code 
path will be executed in sstabledump.

I'm checking with the owner of this table regarding how they do upsert of this 
column and thus result in problem.

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2018-02-13 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363449#comment-16363449
 ] 

Simon Zhou commented on CASSANDRA-14200:


Thanks [~cnlwsu]! I'll provide patches for 3.11 and 4.0 shortly.

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID

2018-01-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-14199:
---
Status: Patch Available  (was: Open)

> exception when dumping sstable with frozen collection of UUID
> -
>
> Key: CASSANDRA-14199
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14199
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> When dumping (sstabledump) sstable with frozen collection of UUID, there is 
> exception like this:
> {code:java}
> Exception in thread "main" org.apache.cassandra.serializers.MarshalException: 
> UUID should be 16 or 0 bytes (24)
> at 
> org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43)
> at 
> org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at 
> org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102)
> at 
> org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
>  
> *Steps to reproduce:*
> {code:java}
> cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 
> list, c2 frozen>, c3 set, c4 frozen>, c5 
> map, c6 frozen>);
> cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, 
> c6) VALUES ( 'id', [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], 
> [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], {'set', 'user'}, {'view', 'over'}, 
> {'good': 'hello', 'root': 'text'}, {'driver': 'java', 'note': 'new'});{code}
>  
> *Root cause:*
> Frozen collection is treated as simple column and it's the client's 
> responsibility to parse the data from ByteBuffer. We have this logic in 
> different drivers but sstabledump doesn't have the logic in place. It just 
> treat the whole collection as a single UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2018-01-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-14200:
---
Status: Patch Available  (was: Open)

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2018-01-29 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344331#comment-16344331
 ] 

Simon Zhou commented on CASSANDRA-14200:


Fix for 3.0 is 
[here|https://github.com/szhou1234/cassandra/commit/701d7cfa7425e595363a07138baa2c5661d9f1cf].
 I'll provide fixes for later versions.

[~cnlwsu] could you take a look at this one as well? After the fix, the output 
of sstabledump looks like this (note the column "terminated_at"):
{code}
"rows" : [
  {
"type" : "row",
"position" : 302,
"clustering" : [ "68dc822e-0481-41d5-8dbd-10bd00703644" ],
"liveness_info" : { "tstamp" : "2018-01-24T14:34:45.716783Z" },
"cells" : [
  { "name" : "moved_at", "value" : "\"2018-01-24 14:34:45.698Z\"" },
  { "name" : "node_uuid", "value" : 
"\"0b045d26-764c-4023-9570-00b8ebe10cca\"" },
  { "name" : "processed_event_uuids", "value" : 
"[\"28d81650-7119-456c-9d1e-216ed8986a55\"]" },
  { "name" : "status", "value" : "0" },
  { "name" : "terminated_at" },
  { "name" : "event_types", "deletion_info" : { "marked_deleted" : 
"2018-01-24T14:34:45.716782Z", "local_delete_time" : "2018-01-24T14:34:45Z" } }
]
  }
]
{code}

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2018-01-29 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-14200:
---
Summary: NullPointerException when dumping sstable with null value for 
timestamp column  (was: NullPointerException when dumping sstable)

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14200) NullPointerException when dumping sstable

2018-01-29 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-14200:
--

 Summary: NullPointerException when dumping sstable
 Key: CASSANDRA-14200
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou
 Fix For: 3.0.x


We have an sstable whose schema has a column of type timestamp and it's not 
part of primary key. When dumping the sstable using sstabledump there is NPE 
like this:
{code:java}
Exception in thread "main" java.lang.NullPointerException
at java.util.Calendar.setTime(Calendar.java:1770)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
at java.text.DateFormat.format(DateFormat.java:345)
at 
org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
at 
org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
at 
org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
at 
org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
at 
org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}

The reason is that we use a null Date when there is no value for this column:
{code}
public Date deserialize(ByteBuffer bytes)
{
return bytes.remaining() == 0 ? null : new 
Date(ByteBufferUtil.toLong(bytes));
}
{code}

It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID

2018-01-29 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344218#comment-16344218
 ] 

Simon Zhou commented on CASSANDRA-14199:


I pushed a fix for 3.0 
[here|https://github.com/szhou1234/cassandra/commit/1a8fff02e93e0acb90e785fdca7f30d9aae54b1a]
 and will provide fixes for newer versions.

[~cnlwsu] could you help review this? For your convenience, this is how the 
output looks like after the fix:
{code}
[
  {
"partition" : {
  "key" : [ "id" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 181,
"liveness_info" : { "tstamp" : "2018-01-29T22:58:49.111820Z" },
"cells" : [
  { "name" : "c2", "frozen" : true, "values" : [ "3", "4" ] },
  { "name" : "c4", "frozen" : true, "values" : [ "over", "view" ] },
  { "name" : "c6", "frozen" : true, "values" : { "driver" : "java", 
"note" : "new" } },
  { "name" : "c1", "deletion_info" : { "marked_deleted" : 
"2018-01-29T22:58:49.111819Z", "local_delete_time" : "2018-01-29T22:58:49Z" } },
  { "name" : "c1", "path" : [ "f7b01890-0547-11e8-817b-adb40ecebcf5" ] 
},
  { "name" : "c1", "path" : [ "f7b01891-0547-11e8-817b-adb40ecebcf5" ] 
},
  { "name" : "c3", "deletion_info" : { "marked_deleted" : 
"2018-01-29T22:58:49.111819Z", "local_delete_time" : "2018-01-29T22:58:49Z" } },
  { "name" : "c3", "path" : [ "set" ] },
  { "name" : "c3", "path" : [ "user" ] },
  { "name" : "c5", "deletion_info" : { "marked_deleted" : 
"2018-01-29T22:58:49.111819Z", "local_delete_time" : "2018-01-29T22:58:49Z" } },
  { "name" : "c5", "path" : [ "good" ] },
  { "name" : "c5", "path" : [ "root" ] }
]
  }
]
  }
]
{code}

Two changes:
- I added field "frozen" for frozen collections.
- The elements in the frozen collection will be in one line (other than one 
line per each element in un-frozen collection), to better indicate that they 
are immutable.

There could be another independent issue that, for un-frozen collection, there 
is always one output line for "deletion_info", even the cell doesn't have any 
deletion. Anyway there should be a separate fix if it's an issue.

> exception when dumping sstable with frozen collection of UUID
> -
>
> Key: CASSANDRA-14199
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14199
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Major
> Fix For: 3.0.x
>
>
> When dumping (sstabledump) sstable with frozen collection of UUID, there is 
> exception like this:
> {code:java}
> Exception in thread "main" org.apache.cassandra.serializers.MarshalException: 
> UUID should be 16 or 0 bytes (24)
> at 
> org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43)
> at 
> org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at 
> org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102)
> at 
> org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
>  
> *Steps to reproduce:*
> {code:java}
> cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 
> list, c2 frozen>, c3 set, c4 frozen>, c5 
> map, c6 frozen>);
> cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, 
> c6) VALUES ( 'id', [6947e8c0

[jira] [Created] (CASSANDRA-14199) exception when dumping sstable with frozen collection of UUID

2018-01-29 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-14199:
--

 Summary: exception when dumping sstable with frozen collection of 
UUID
 Key: CASSANDRA-14199
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14199
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Simon Zhou
Assignee: Simon Zhou
 Fix For: 3.0.x


When dumping (sstabledump) sstable with frozen collection of UUID, there is 
exception like this:
{code:java}
Exception in thread "main" org.apache.cassandra.serializers.MarshalException: 
UUID should be 16 or 0 bytes (24)
at 
org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java:43)
at 
org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
at 
org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:440)
at 
org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:374)
at 
org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:278)
at 
org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:213)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at 
org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:102)
at 
org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
 

*Steps to reproduce:*
{code:java}
cqlsh> create TABLE stresscql.sstabledump_test(userid text PRIMARY KEY, c1 
list, c2 frozen>, c3 set, c4 frozen>, c5 
map, c6 frozen>);
cqlsh> insert INTO stresscql.sstabledump_test (userid, c1, c2, c3, c4, c5, c6) 
VALUES ( 'id', [6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], 
[6947e8c0-02fa-11e8-87e1-fb0d0e20b5c4], {'set', 'user'}, {'view', 'over'}, 
{'good': 'hello', 'root': 'text'}, {'driver': 'java', 'note': 'new'});{code}
 

*Root cause:*

Frozen collection is treated as simple column and it's the client's 
responsibility to parse the data from ByteBuffer. We have this logic in 
different drivers but sstabledump doesn't have the logic in place. It just 
treat the whole collection as a single UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13877) Potential concurrency issue with CDC size calculation

2017-09-18 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou resolved CASSANDRA-13877.

Resolution: Invalid

Normally if you have multiple writers (producers), the add operation may use a 
stale local copy even if the variable is volatile. Now since we have single 
producer here, it's safe to use volatile, as citing from the book:
{code}
You can use volatile variables only when all the following criteria are met:
• Writes to the variable do not depend on its current value, or you can ensure
that only a single thread ever updates the value;
...
{code}

Thanks for commenting. I'm closing this ticket.

> Potential concurrency issue with CDC size calculation
> -
>
> Key: CASSANDRA-13877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13877
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>
> We're backporting CDC feature and bug fixes to 3.0. There is potential 
> visibility issue with two variables {{CDCSizeTracker.sizeInProgress}} and 
> {{DirectorySizeCalculator.size}} . They're declared as volatile however there 
> are cases that when assigning new values to them, the new values depend on 
> the current value. For example:
> https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L285
> https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L297
> In rare cases we'll not be able to calculate CDC data size correctly. We 
> should change these two variables back to AtomicLong, as the simplest fix. 
> Java Concurrency In Practice section 3.1.3 explains well why we shouldn't use 
> volatile in these two cases. I'll provide patch shortly.
> cc [~JoshuaMcKenzie] [~jay.zhuang]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13877) Potential concurrency issue with CDC size calculation

2017-09-14 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13877:
--

 Summary: Potential concurrency issue with CDC size calculation
 Key: CASSANDRA-13877
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13877
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou


We're backporting CDC feature and bug fixes to 3.0. There is potential 
visibility issue with two variables CDCSizeTracker.sizeInProgress and 
DirectorySizeCalculator.size. They're declared as volatile however there are 
cases that when assigning new values to them, the new values depend on the 
current value. For example:
https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L285
https://github.com/apache/cassandra/blob/e9da85723a8dd40872c4bca087a03b655bd2cacb/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L297

In rare cases we'll not be able to calculate CDC data size correctly. We should 
change these two variables back to AtomicLong, as the simplest fix. Java 
Concurrency In Practice section 3.1.3 explains well why we shouldn't use 
volatile in these two cases. I'll provide patch shortly.

cc [~JoshuaMcKenzie] [~jay.zhuang]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

2017-08-16 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13323:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Yeap. That's the right fix. Thanks for comment.

> IncomingTcpConnection closed due to one bad message
> ---
>
> Key: CASSANDRA-13323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.x
>
> Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
> IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 
> OutboundTcpConnection.java:515 - Handshaking version with /
> {code}
> The reason is that the node was receiving hinted data for a dropped table. 
> This may happen with other messages as well. On Cassandra side, 
> IncomingTcpConnection shouldn't close on just one bad message, even though it 
> will be restarted soon later by SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13387) Metrics for repair

2017-08-14 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125942#comment-16125942
 ] 

Simon Zhou commented on CASSANDRA-13387:


Stefan, thank you so much for code review!

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 4.0
>
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13737) Node start can fail if the base table of a materialized view is not found

2017-08-01 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109856#comment-16109856
 ] 

Simon Zhou commented on CASSANDRA-13737:


We had the same issue on 3.0.14 couple of days ago. Looks like somehow the MV 
data was corrupted and restart of any data would be stuck. Even "drop MV" from 
cqlsh doesn't work (on a different node, before restart) because the base table 
doesn't exist.

> Node start can fail if the base table of a materialized view is not found
> -
>
> Key: CASSANDRA-13737
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13737
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Materialized Views
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Node start can fail if the base table of a materialized view is not found, 
> which is something that can happen under certain circumstances. There is a 
> dtest reproducing the problem:
> {code}
> cluster = self.cluster
> cluster.populate(3)
> cluster.start()
> node1, node2, node3 = self.cluster.nodelist()
> session = self.patient_cql_connection(node1, 
> consistency_level=ConsistencyLevel.QUORUM)
> create_ks(session, 'ks', 3)
> session.execute('CREATE TABLE users (username varchar PRIMARY KEY, state 
> varchar)')
> node3.stop(wait_other_notice=True)
> # create a materialized view only in nodes 1 and 2
> session.execute(('CREATE MATERIALIZED VIEW users_by_state AS '
>  'SELECT * FROM users WHERE state IS NOT NULL AND username IS 
> NOT NULL '
>  'PRIMARY KEY (state, username)'))
> node1.stop(wait_other_notice=True)
> node2.stop(wait_other_notice=True)
> # drop the base table only in node 3
> node3.start(wait_for_binary_proto=True)
> session = self.patient_cql_connection(node3, 
> consistency_level=ConsistencyLevel.QUORUM)
> session.execute('DROP TABLE ks.users')
> cluster.stop()
> cluster.start()  # Fails
> {code}
> This is the error during node start:
> {code}
> java.lang.IllegalArgumentException: Unknown CF 
> 958ebc30-76e4-11e7-869a-9d8367a71c76
>   at 
> org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:215) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.view.ViewManager.addView(ViewManager.java:143) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.view.ViewManager.reload(ViewManager.java:113) 
> ~[main/:na]
>   at org.apache.cassandra.schema.Schema.alterKeyspace(Schema.java:618) 
> ~[main/:na]
>   at org.apache.cassandra.schema.Schema.lambda$merge$18(Schema.java:591) 
> ~[main/:na]
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575)
>  ~[na:1.8.0_131]
>   at java.util.HashMap$EntrySet.forEach(HashMap.java:1043) ~[na:1.8.0_131]
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.forEach(Collections.java:1580)
>  ~[na:1.8.0_131]
>   at org.apache.cassandra.schema.Schema.merge(Schema.java:591) ~[main/:na]
>   at 
> org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:564) 
> ~[main/:na]
>   at 
> org.apache.cassandra.schema.MigrationTask$1.response(MigrationTask.java:89) 
> ~[main/:na]
>   at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
>  ~[main/:na]
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) 
> ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_131]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_131]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_131]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_131]
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>  [main/:na]
>   at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13720) Clean up repair code

2017-07-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097004#comment-16097004
 ] 

Simon Zhou commented on CASSANDRA-13720:


Not meant to be fixing something, but there are places that I want to make the 
code less confusing:

|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/b9a410b74f42af7519010dff1fd03372ce38a412]|

[~spo...@gmail.com] Could you please review this patch? It's rebased on my 
patch for CASSANDRA-13387. Thank you.

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 4.0
>
>
> Lots of unused code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13720) Clean up repair code

2017-07-21 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13720:
---
Status: Patch Available  (was: Open)

> Clean up repair code
> 
>
> Key: CASSANDRA-13720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 4.0
>
>
> Lots of unused code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13720) Clean up repair code

2017-07-21 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13720:
--

 Summary: Clean up repair code
 Key: CASSANDRA-13720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13720
 Project: Cassandra
  Issue Type: Improvement
Reporter: Simon Zhou
Assignee: Simon Zhou
 Fix For: 4.0


Lots of unused code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13387) Metrics for repair

2017-07-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096978#comment-16096978
 ] 

Simon Zhou commented on CASSANDRA-13387:


Just realized that's a public interface and I should keep it. Updated patch and 
rebased:

|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/306ed07aa4a7a572d085c41fd0c1067719505262]|

Btw, repair code needs to be cleaned, e.g., there are some unused variables, 
etc. I'll open another ticket for this.

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13387) Metrics for repair

2017-07-18 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091917#comment-16091917
 ] 

Simon Zhou commented on CASSANDRA-13387:


Rebased patch:
|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/54ad6690fd306fe2b5a93d73808064ab29f1]|

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13679) Add option to customize badness_threshold in dynamic endpoint snitch

2017-07-14 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou resolved CASSANDRA-13679.

Resolution: Not A Problem

Just realized there is a cassandra.yaml option so this ticket is not needed.

> Add option to customize badness_threshold in dynamic endpoint snitch
> 
>
> Key: CASSANDRA-13679
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13679
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Attachments: Screen Shot 2017-07-07 at 5.01.48 PM.png
>
>
> I'm working on tuning dynamic endpoint snitch and looks like the default 
> value (0.1) for Config.dynamic_snitch_badness_threshold is too sensitive and 
> causes traffic imbalance among nodes, especially with my patch for 
> CASSANDRA-13577. So we should:
> 1. Revisit the default value.
> 2. Add an option to allow customize badness_threshold during bootstrap.
> This ticket is to track #2. I attached a screenshot to show that, after 
> increasing badness_threshold from 0.1 to 1.0 by using patch from 
> CASSANDRA-12179, the traffic imbalance is gone.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13679) Add option to customize badness_threshold in dynamic endpoint snitch

2017-07-14 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087527#comment-16087527
 ] 

Simon Zhou commented on CASSANDRA-13679:


Here are the patches. Not sure if we need one for 3.11.

|3.0.x |[patch | 
https://github.com/szhou1234/cassandra/commit/50cc71418d3fc75b1d8225eb1bded95ac1f1bdd7]|
|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/a7144f8d50872dc4e5591db73ff770388d410403]|


> Add option to customize badness_threshold in dynamic endpoint snitch
> 
>
> Key: CASSANDRA-13679
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13679
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Attachments: Screen Shot 2017-07-07 at 5.01.48 PM.png
>
>
> I'm working on tuning dynamic endpoint snitch and looks like the default 
> value (0.1) for Config.dynamic_snitch_badness_threshold is too sensitive and 
> causes traffic imbalance among nodes, especially with my patch for 
> CASSANDRA-13577. So we should:
> 1. Revisit the default value.
> 2. Add an option to allow customize badness_threshold during bootstrap.
> This ticket is to track #2. I attached a screenshot to show that, after 
> increasing badness_threshold from 0.1 to 1.0 by using patch from 
> CASSANDRA-12179, the traffic imbalance is gone.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case

2017-07-07 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078856#comment-16078856
 ] 

Simon Zhou commented on CASSANDRA-13577:


Now with this patch, badness_threshold looks too sensitive and triggers traffic 
imbalance. I created CASSANDRA-13679 to follow up on that.

> Fix dynamic endpoint snitch for sub-millisecond use case
> 
>
> Key: CASSANDRA-13577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13577
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.x
>
>
> This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. 
> After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few 
> production clusters, I observed that the scores for all the endpoints are 
> mostly 0.0. Through debugging, I found this is caused by that these clusters 
> have p50 latency well below 1ms and the network latency is also <0.1ms (round 
> trip). Be noted that we use p50 sampled read latency and millisecond as time 
> unit. That means, if the latency is mostly below 1ms, the score will be 0. 
> This is definitely not something we want. To make DES work for these 
> sub-millisecond use cases, we should change the timeunit to at least 
> microsecond, or even nanosecond. I'll provide a patch soon.
> Evidence of the p50 latency:
> {code}
> nodetool tablehistograms  
> Percentile  SSTables Write Latency  Read LatencyPartition Size
> Cell Count
>   (micros)  (micros)   (bytes)
>   
> 50% 2.00 35.43454.83 20501
>  3
> 75% 2.00 42.51654.95 29521
>  3
> 95% 3.00182.79943.13 61214
>  3
> 98% 4.00263.21   1131.75 73457
>  3
> 99% 4.00315.85   1358.10 88148
>  3
> Min 0.00  9.89 11.8761
>  3
> Max 5.00654.95 129557.75943127
>  3
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13679) Add option to customize badness_threshold in dynamic endpoint snitch

2017-07-07 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13679:
--

 Summary: Add option to customize badness_threshold in dynamic 
endpoint snitch
 Key: CASSANDRA-13679
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13679
 Project: Cassandra
  Issue Type: Improvement
Reporter: Simon Zhou
Assignee: Simon Zhou
 Attachments: Screen Shot 2017-07-07 at 5.01.48 PM.png

I'm working on tuning dynamic endpoint snitch and looks like the default value 
(0.1) for Config.dynamic_snitch_badness_threshold is too sensitive and causes 
traffic imbalance among nodes, especially with my patch for CASSANDRA-13577. So 
we should:
1. Revisit the default value.
2. Add an option to allow customize badness_threshold during bootstrap.

This ticket is to track #2. I attached a screenshot to show that, after 
increasing badness_threshold from 0.1 to 1.0 by using patch from 
CASSANDRA-12179, the traffic imbalance is gone.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2017-06-14 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049505#comment-16049505
 ] 

Simon Zhou edited comment on CASSANDRA-10862 at 6/14/17 6:42 PM:
-

[~scv...@gmail.com] Are you still working on this?


was (Author: szhou):
[~chenshen] Are you still working on this?

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>  Labels: lcs
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2017-06-14 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049505#comment-16049505
 ] 

Simon Zhou commented on CASSANDRA-10862:


[~chenshen] Are you still working on this?

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>  Labels: lcs
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case

2017-06-07 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042055#comment-16042055
 ] 

Simon Zhou commented on CASSANDRA-13577:


Here are the patches. Not sure if we need one for 3.11.

|3.0.x |[patch | 
https://github.com/szhou1234/cassandra/commit/50a0a081f976d94b2d6f7883e28d4c427baa120c]|
|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/73a3ff467a852eec7993efb0133945416bad4e46]|


> Fix dynamic endpoint snitch for sub-millisecond use case
> 
>
> Key: CASSANDRA-13577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13577
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.x
>
>
> This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. 
> After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few 
> production clusters, I observed that the scores for all the endpoints are 
> mostly 0.0. Through debugging, I found this is caused by that these clusters 
> have p50 latency well below 1ms and the network latency is also <0.1ms (round 
> trip). Be noted that we use p50 sampled read latency and millisecond as time 
> unit. That means, if the latency is mostly below 1ms, the score will be 0. 
> This is definitely not something we want. To make DES work for these 
> sub-millisecond use cases, we should change the timeunit to at least 
> microsecond, or even nanosecond. I'll provide a patch soon.
> Evidence of the p50 latency:
> {code}
> nodetool tablehistograms  
> Percentile  SSTables Write Latency  Read LatencyPartition Size
> Cell Count
>   (micros)  (micros)   (bytes)
>   
> 50% 2.00 35.43454.83 20501
>  3
> 75% 2.00 42.51654.95 29521
>  3
> 95% 3.00182.79943.13 61214
>  3
> 98% 4.00263.21   1131.75 73457
>  3
> 99% 4.00315.85   1358.10 88148
>  3
> Min 0.00  9.89 11.8761
>  3
> Max 5.00654.95 129557.75943127
>  3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case

2017-06-07 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13577:
---
Status: Patch Available  (was: Open)

> Fix dynamic endpoint snitch for sub-millisecond use case
> 
>
> Key: CASSANDRA-13577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13577
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.x
>
>
> This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. 
> After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few 
> production clusters, I observed that the scores for all the endpoints are 
> mostly 0.0. Through debugging, I found this is caused by that these clusters 
> have p50 latency well below 1ms and the network latency is also <0.1ms (round 
> trip). Be noted that we use p50 sampled read latency and millisecond as time 
> unit. That means, if the latency is mostly below 1ms, the score will be 0. 
> This is definitely not something we want. To make DES work for these 
> sub-millisecond use cases, we should change the timeunit to at least 
> microsecond, or even nanosecond. I'll provide a patch soon.
> Evidence of the p50 latency:
> {code}
> nodetool tablehistograms  
> Percentile  SSTables Write Latency  Read LatencyPartition Size
> Cell Count
>   (micros)  (micros)   (bytes)
>   
> 50% 2.00 35.43454.83 20501
>  3
> 75% 2.00 42.51654.95 29521
>  3
> 95% 3.00182.79943.13 61214
>  3
> 98% 4.00263.21   1131.75 73457
>  3
> 99% 4.00315.85   1358.10 88148
>  3
> Min 0.00  9.89 11.8761
>  3
> Max 5.00654.95 129557.75943127
>  3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13577) Fix dynamic endpoint snitch for sub-millisecond use case

2017-06-06 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13577:
--

 Summary: Fix dynamic endpoint snitch for sub-millisecond use case
 Key: CASSANDRA-13577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13577
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou
 Fix For: 3.0.x


This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908. 
After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few production 
clusters, I observed that the scores for all the endpoints are mostly 0.0. 
Through debugging, I found this is caused by that these clusters have p50 
latency well below 1ms and the network latency is also <0.1ms (round trip). Be 
noted that we use p50 sampled read latency and millisecond as time unit. That 
means, if the latency is mostly below 1ms, the score will be 0. This is 
definitely not something we want. To make DES work for these sub-millisecond 
use cases, we should change the timeunit to at least microsecond, or even 
nanosecond. I'll provide a patch soon.

Evidence of the p50 latency:
{code}
nodetool tablehistograms  
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)  

50% 2.00 35.43454.83 20501  
   3
75% 2.00 42.51654.95 29521  
   3
95% 3.00182.79943.13 61214  
   3
98% 4.00263.21   1131.75 73457  
   3
99% 4.00315.85   1358.10 88148  
   3
Min 0.00  9.89 11.8761  
   3
Max 5.00654.95 129557.75943127  
   3
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

2017-06-06 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039888#comment-16039888
 ] 

Simon Zhou commented on CASSANDRA-6908:
---

Luckily we have a JVM option to disable severity when calculating scores 
(CASSANDRA-11737/CASSANDRA-11738). After applying this option in a few 
production clusters, however I observed that the scores for all the endpoints 
are mostly 0.0. Through debugging, I found this is caused by that these 
clusters have p50 latency well below 1ms and the network latency is also <0.1ms 
(round trip). Be noted that we use p50 sampled read latency and millisecond as 
time unit. That means, if the latency is mostly below 1ms, the score will be 0. 
This is definitely not something we want. To make DES work for these 
sub-millisecond use cases, we should change the timeunit to at least 
microsecond, or even nanosecond. I'll create another ticket for this.

{code}
nodetool tablehistograms  

Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)  

50% 2.00 35.43545.79 20501  
   3
75% 3.00 42.51654.95 35425  
   3
95% 3.00152.32943.13 61214  
   3
98% 4.00263.21   1131.75 73457  
   3
99% 4.00315.85   1358.10 88148  
   3
Min 0.00  9.89  9.8961  
   3
Max 5.00785.94  89970.66   1131752  
   3
{code}

> Dynamic endpoint snitch destabilizes cluster under heavy load
> -
>
> Key: CASSANDRA-6908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Bartłomiej Romański
>Assignee: Brandon Williams
> Attachments: 
> 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch, 
> as-dynamic-snitch-disabled.png
>
>
> We observe that with dynamic snitch disabled our cluster is much more stable 
> than with dynamic snitch enabled.
> We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB 
> RAM, 2x480 GB SSD). We mostly do reads (about 300k/s).
> We use Astyanax on client side with TOKEN_AWARE option enabled. It 
> automatically direct read queries to one of the nodes responsible the given 
> token.
> In that case with dynamic snitch disabled Cassandra always handles read 
> locally. With dynamic snitch enabled Cassandra very often decides to proxy 
> the read to some other node. This causes much higher CPU usage and produces 
> much more garbage what results in more often GC pauses (young generation 
> fills up quicker). By "much higher" and "much more" I mean 1.5-2x.
> I'm aware that higher dynamic_snitch_badness_threshold value should solve 
> that issue. The default value is 0.1. I've looked at scores exposed in JMX 
> and the problem is that our values seemed to be completely random. They are 
> between usually 0.5 and 2.0, but changes randomly every time I hit refresh.
> Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something 
> like that, but the result will be similar to simply disabling the dynamic 
> switch at all (that's what we done).
> I've tried to understand what's the logic behind these scores and I'm not 
> sure if I get the idea...
> It's a sum (without any multipliers) of two components:
> - ratio of recent given node latency to recent average node latency
> - something called 'severity', what, if I analyzed the code correctly, is a 
> result of BackgroundActivityMonitor.getIOWait() - it's a ratio of "iowait" 
> CPU time to the whole CPU time as reported in /proc/stats (the ratio is 
> multiplied by 100)
> In our case the second value is something around 0-2% but varies quite 
> heavily every second.
> What's the idea behind simply adding this two values without any multipliers 
> (e.g the second one is in percentage while the first one is not)? Are we sure 
> this is the best possible way of calculating the final score?
> Is there a way too force Cassandra to use (much) longer samples? In our case 
> we probably need that to get stable values. The 'severity' is calculated for 
> each second. The mean latency is calculated based on some magic, hardcoded 
> values (ALPHA = 0.75, WINDOW_SIZE = 100). 
> Am I right that there's no way to tune that without hacking the code?
> I'm aware that there's dynamic_snitch_update_interval_in_ms property in the 
> co

[jira] [Commented] (CASSANDRA-10876) Alter behavior of batch WARN and fail on single partition batches

2017-06-05 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037786#comment-16037786
 ] 

Simon Zhou commented on CASSANDRA-10876:


I intended to backport this (see CASSANDRA-13467) but may need a committer to 
review it.

> Alter behavior of batch WARN and fail on single partition batches
> -
>
> Key: CASSANDRA-10876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10876
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: lhf
> Fix For: 3.6
>
> Attachments: 10876.txt
>
>
> In an attempt to give operator insight into potentially harmful batch usage, 
> Jiras were created to log WARN or fail on certain batch sizes. This ignores 
> the single partition batch, which doesn't create the same issues as a 
> multi-partition batch. 
> The proposal is to ignore size on single partition batch statements. 
> Reference:
> [CASSANDRA-6487|https://issues.apache.org/jira/browse/CASSANDRA-6487]
> [CASSANDRA-8011|https://issues.apache.org/jira/browse/CASSANDRA-8011]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

2017-06-01 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034112#comment-16034112
 ] 

Simon Zhou edited comment on CASSANDRA-6908 at 6/2/17 5:43 AM:
---

We got similar issue and thus I worked out a simple patch (attached) to 
decouple scores for iowait and sampled read latency. From my observation, there 
are two issues:
1. The iowait score of one node changes frequently and the gaps among the 
scores for different nodes are usually far beyond the default 1.1 threshold.
2. The (median) latency scores don't vary too much but the differences may 
still be more than 1.1x. Also some nodes from local datacenter have 0 latency 
scores. I understand that the nodes in the remote datacenter may not have 
latency data since local_quorum or local_one is being used. The issue for 
remote data center has actually been fixed in CASSANDRA-13074 (we're running 
3.0.13).

There are the numbers I got (formatted) from a two-datacenter cluster (10 nodes 
in each datacenter), with my patch. The ip addresses have been obfuscated.

{code}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores
06/01/2017 23:30:36 + org.archive.jmx.Client LatencyScores: {
/node1=0.7832167832167832
/node2=0.0
/node3=1.0
/node4=0.0
/node5=0.0
/node6=0.43356643356643354
/node7=0.4825174825174825
/node8=0.0
/node9=0.8881118881118881
/node10=0.0
/node11=0.9440559440559441
/node12=0.0
/node13=0.0
/node14=0.0
/node15=0.0
/node16=0.0}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores
06/01/2017 23:30:45 + org.archive.jmx.Client LatencyScores: {
/node1=0.0
/node2=1.0
/node3=0.0
/node4=0.0
/node5=0.43356643356643354
/node6=0.4825174825174825
/node7=0.0
/node8=0.8881118881118881
/node9=0.0
/node10=0.9440559440559441
/node11=0.0
/node12=0.0
/node13=0.0
/node15=0.0
/node16=0.0
/node17=0.7832167832167832
}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch IOWaitScores
06/01/2017 23:30:54 + org.archive.jmx.Client IOWaitScores: {
/node1=5.084033489227295
/node2=4.024896621704102
/node3=4.54736852645874
/node4=4.947588920593262
/node5=3.4599156379699707
/node6=4.0653815269470215
/node7=6.989473819732666
/node8=3.371259927749634
/node9=5.800169467926025
/node10=3.2855939865112305
/node11=5.631399154663086
/node12=5.484004974365234
/node13=0.9635525941848755
/node14=1.5043878555297852
/node15=6.481481552124023
/node16=3.751563310623169}
{code}

Yes we can workaround the issue by increasing the badness_threshold. But the 
problems are:
1. The default threshold doesn't work well.
2. iowait (percentage) is not a good measurement of end to end latency, not 
only because it changes frequently, from second to second, but also it's just a 
low level metric that doesn't reflect the whole picture, which should also 
include GC/safepoint pauses, thread scheduling delays, etc.
3. Instead of using median read latency, can we use maybe p95 latency as a 
better factor when calculating scores? I haven't experimented this yet.

[~brandon.williams] what do you think? [~kohlisankalp] Looks like we have some 
fix (or improvements?) in 4.0 but you mentioned in a meeting that DES could be 
improved. I'd also like get your ideas on this. I can work on this if we can 
agree on something.


was (Author: szhou):
We got similar issue and thus I worked out a simple patch (attached) to 
decouple scores for iowait and sampled read latency. From my observation, there 
are two issues:
1. The iowait score of one node changes frequently and the gaps among the 
scores for different nodes are usually far beyond the default 1.1 threshold.
2. The (median) latency scores don't vary too much however some nodes have 0 
latency scores, even with the fix for CASSANDRA-13074 (we're running 3.0.13).

There are the numbers I got (formatted) with my attached patch:
{code}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores
06/01/2017 23:30:36 + org.archive.jmx.Client LatencyScores: {
/node1=0.7832167832167832
/node2=0.0
/node3=1.0
/node4=0.0
/node5=0.0
/node6=0.43356643356643354
/node7=0.4825174825174825
/node8=0.0
/node9=0.8881118881118881
/node10=0.0
/node11=0.9440559440559441
/node12=0.0
/node13=0.0
/node14=0.0
/node15=0.0
/node16=0.0}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores
06/01/2017 23:30:45 + org.archive.jmx.Client LatencyScores: 
{/10.165.10.5=0.7832167832167832
/node1=0.0
/node2=1.0
/node3=0.0
/node4=0.0
/node5=0.43356643356643354
/node6=0.4825174825174825
/node7=0.0
/node8=0.8881118881118881
/node9=0.0
/node10=0.9440559

[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

2017-06-01 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034112#comment-16034112
 ] 

Simon Zhou commented on CASSANDRA-6908:
---

We got similar issue and thus I worked out a simple patch (attached) to 
decouple scores for iowait and sampled read latency. From my observation, there 
are two issues:
1. The iowait score of one node changes frequently and the gaps among the 
scores for different nodes are usually far beyond the default 1.1 threshold.
2. The (median) latency scores don't vary too much however some nodes have 0 
latency scores, even with the fix for CASSANDRA-13074 (we're running 3.0.13).

There are the numbers I got (formatted) with my attached patch:
{code}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores
06/01/2017 23:30:36 + org.archive.jmx.Client LatencyScores: {
/node1=0.7832167832167832
/node2=0.0
/node3=1.0
/node4=0.0
/node5=0.0
/node6=0.43356643356643354
/node7=0.4825174825174825
/node8=0.0
/node9=0.8881118881118881
/node10=0.0
/node11=0.9440559440559441
/node12=0.0
/node13=0.0
/node14=0.0
/node15=0.0
/node16=0.0}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch LatencyScores
06/01/2017 23:30:45 + org.archive.jmx.Client LatencyScores: 
{/10.165.10.5=0.7832167832167832
/node1=0.0
/node2=1.0
/node3=0.0
/node4=0.0
/node5=0.43356643356643354
/node6=0.4825174825174825
/node7=0.0
/node8=0.8881118881118881
/node9=0.0
/node10=0.9440559440559441
/node11=0.0
/node12=0.0
/node13=0.0
/node15=0.0
/node16=0.0}
szhou@host:~$ java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch IOWaitScores
06/01/2017 23:30:54 + org.archive.jmx.Client IOWaitScores: {
/node1=5.084033489227295
/node2=4.024896621704102
/node3=4.54736852645874
/node4=4.947588920593262
/node5=3.4599156379699707
/node6=4.0653815269470215
/node7=6.989473819732666
/node8=3.371259927749634
/node9=5.800169467926025
/node10=3.2855939865112305
/node11=5.631399154663086
/node12=5.484004974365234
/node13=0.9635525941848755
/node14=1.5043878555297852
/node15=6.481481552124023
/node16=3.751563310623169}
{code}

Yes we can workaround the issue by increasing the badness_threshold. But the 
problems are:
1. The default threshold doesn't work well.
2. iowait (percentage) is not a good measurement of end to end latency, not 
only because it changes frequently, from second to second, but also it's just a 
low level metric that doesn't reflect the whole picture, which should also 
include GC/safepoint pauses, thread scheduling delays, etc.
3. Instead of using median read latency, can we use maybe p95 latency as a 
better factor when calculating scores? I haven't experimented this yet.

[~brandon.williams] what do you think? [~kohlisankalp] Looks like we have some 
fix (or improvements?) in 4.0 but you mentioned in a meeting that DES could be 
improved. I'd also like get your ideas on this. I can work on this if we can 
agree on something.

> Dynamic endpoint snitch destabilizes cluster under heavy load
> -
>
> Key: CASSANDRA-6908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Bartłomiej Romański
>Assignee: Brandon Williams
> Attachments: 
> 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch, 
> as-dynamic-snitch-disabled.png
>
>
> We observe that with dynamic snitch disabled our cluster is much more stable 
> than with dynamic snitch enabled.
> We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB 
> RAM, 2x480 GB SSD). We mostly do reads (about 300k/s).
> We use Astyanax on client side with TOKEN_AWARE option enabled. It 
> automatically direct read queries to one of the nodes responsible the given 
> token.
> In that case with dynamic snitch disabled Cassandra always handles read 
> locally. With dynamic snitch enabled Cassandra very often decides to proxy 
> the read to some other node. This causes much higher CPU usage and produces 
> much more garbage what results in more often GC pauses (young generation 
> fills up quicker). By "much higher" and "much more" I mean 1.5-2x.
> I'm aware that higher dynamic_snitch_badness_threshold value should solve 
> that issue. The default value is 0.1. I've looked at scores exposed in JMX 
> and the problem is that our values seemed to be completely random. They are 
> between usually 0.5 and 2.0, but changes randomly every time I hit refresh.
> Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something 
> like that, but the result will be similar to simply disabling th

[jira] [Updated] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

2017-06-01 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-6908:
--
Attachment: 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch

> Dynamic endpoint snitch destabilizes cluster under heavy load
> -
>
> Key: CASSANDRA-6908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Bartłomiej Romański
>Assignee: Brandon Williams
> Attachments: 
> 0001-Decouple-IO-scores-and-latency-scores-from-DynamicEn.patch, 
> as-dynamic-snitch-disabled.png
>
>
> We observe that with dynamic snitch disabled our cluster is much more stable 
> than with dynamic snitch enabled.
> We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB 
> RAM, 2x480 GB SSD). We mostly do reads (about 300k/s).
> We use Astyanax on client side with TOKEN_AWARE option enabled. It 
> automatically direct read queries to one of the nodes responsible the given 
> token.
> In that case with dynamic snitch disabled Cassandra always handles read 
> locally. With dynamic snitch enabled Cassandra very often decides to proxy 
> the read to some other node. This causes much higher CPU usage and produces 
> much more garbage what results in more often GC pauses (young generation 
> fills up quicker). By "much higher" and "much more" I mean 1.5-2x.
> I'm aware that higher dynamic_snitch_badness_threshold value should solve 
> that issue. The default value is 0.1. I've looked at scores exposed in JMX 
> and the problem is that our values seemed to be completely random. They are 
> between usually 0.5 and 2.0, but changes randomly every time I hit refresh.
> Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something 
> like that, but the result will be similar to simply disabling the dynamic 
> switch at all (that's what we done).
> I've tried to understand what's the logic behind these scores and I'm not 
> sure if I get the idea...
> It's a sum (without any multipliers) of two components:
> - ratio of recent given node latency to recent average node latency
> - something called 'severity', what, if I analyzed the code correctly, is a 
> result of BackgroundActivityMonitor.getIOWait() - it's a ratio of "iowait" 
> CPU time to the whole CPU time as reported in /proc/stats (the ratio is 
> multiplied by 100)
> In our case the second value is something around 0-2% but varies quite 
> heavily every second.
> What's the idea behind simply adding this two values without any multipliers 
> (e.g the second one is in percentage while the first one is not)? Are we sure 
> this is the best possible way of calculating the final score?
> Is there a way too force Cassandra to use (much) longer samples? In our case 
> we probably need that to get stable values. The 'severity' is calculated for 
> each second. The mean latency is calculated based on some magic, hardcoded 
> values (ALPHA = 0.75, WINDOW_SIZE = 100). 
> Am I right that there's no way to tune that without hacking the code?
> I'm aware that there's dynamic_snitch_update_interval_in_ms property in the 
> config file, but that only determines how often the scores are recalculated 
> not how long samples are taken. Is that correct?
> To sum up, It would be really nice to have more control over dynamic snitch 
> behavior or at least have the official option to disable it described in the 
> default config file (it took me some time to discover that we can just 
> disable it instead of hacking with dynamic_snitch_badness_threshold=1000).
> Currently for some scenarios (like ours - optimized cluster, token aware 
> client, heavy load) it causes more harm than good.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13555) Thread leak during repair

2017-05-31 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032401#comment-16032401
 ] 

Simon Zhou commented on CASSANDRA-13555:


Thanks [~tjake] for the comment. I'll be working on the patch but I'm not sure 
whether that is the best fix. Reasons:
1. The "executor" is created in RepairRunnable and runs all RepairJob's for a 
given keyspace. It's not a single RepairSession instance's responsibility to 
stop the "executor", nor it has a reference to it.
2. The bigger problem is that, why do we handle "node down" in RepairSession? 
IMHO it should be handled at a higher level. That means, once an endpoint is 
down, we should stop all RepairRunnable's. Sure there could be improvement, 
e.g., only stop those affected RepairSession's (token ranges). But anyway we 
are not doing this today and it deserves a separate change.

What do you think? I know there is bigger change in upcoming 4.0 but I don't 
want a band-aid fix that just makes things messy.

> Thread leak during repair
> -
>
> Key: CASSANDRA-13555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13555
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>
> The symptom is similar to what happened in [CASSANDRA-13204 | 
> https://issues.apache.org/jira/browse/CASSANDRA-13204] that the thread 
> waiting forever doing nothing. This one happened during "nodetool repair -pr 
> -seq -j 1" in production but I can easily simulate the problem with just 
> "nodetool repair" in dev environment (CCM). I'm trying to explain what 
> happened with 3.0.13 code base.
> 1. One node is down while doing repair. This is the error I saw in production:
> {code}
> ERROR [GossipTasks:1] 2017-05-19 15:00:10,545 RepairSession.java:334 - 
> [repair #bc9a3cd1-3ca3-11e7-a44a-e30923ac9336] session completed with the 
> following error
> java.io.IOException: Endpoint /10.185.43.15 died
> at 
> org.apache.cassandra.repair.RepairSession.convict(RepairSession.java:333) 
> ~[apache-cassandra-3.0.11.jar:3.0.11]
> at 
> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) 
> [apache-cassandra-3.0.11.jar:3.0.11]
> at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:766) 
> [apache-cassandra-3.0.11.jar:3.0.11]
> at org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:66) 
> [apache-cassandra-3.0.11.jar:3.0.11]
> at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:181) 
> [apache-cassandra-3.0.11.jar:3.0.11]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  [apache-cassandra-3.0.11.jar:3.0.11]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_121]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_121]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  [apache-cassandra-3.0.11.jar:3.0.11]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {code}
> 2. At this moment the repair coordinator hasn't received the response 
> (MerkleTrees) for the node that was marked down. This means, RepairJob#run 
> will never return because it waits for validations to finish:
> {code}
> // Wait for validation to complete
> Futures.getUnchecked(validations);
> {code}
> Be noted that all RepairJob's (as Runnable) run on a shared executor created 
> in RepairRunnable#runMayThrow, while all snapshot, validation and sync'ing 
> happen on a per-RepairSession "taskExecutor". The RepairJob#run will only 
> return when it receives MerkleTrees (or null) from all endpoints for a given 
> column family and token range.
> As evidence of the thread leak, below is from the thread dump. I can also get 
> the same stack trace when simulating the same issue in dev environment.
> {code}
> "Repair#129:56" #406373 daemon prio=5 os_prio=0 tid=0x7fc495028400 
> nid=0x1a77d waiting on condition [0x7fc02153]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0002d7c00198> (a 
> com.google.common.util.concurrent.Abstract

[jira] [Created] (CASSANDRA-13555) Thread leak during repair

2017-05-25 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13555:
--

 Summary: Thread leak during repair
 Key: CASSANDRA-13555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13555
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou


The symptom is similar to what happened in [CASSANDRA-13204 | 
https://issues.apache.org/jira/browse/CASSANDRA-13204] that the thread waiting 
forever doing nothing. This one happened during "nodetool repair -pr -seq -j 1" 
in production but I can easily simulate the problem with just "nodetool repair" 
in dev environment (CCM). I'm trying to explain what happened with 3.0.13 code 
base.

1. One node is down while doing repair. This is the error I saw in production:

{code}
ERROR [GossipTasks:1] 2017-05-19 15:00:10,545 RepairSession.java:334 - [repair 
#bc9a3cd1-3ca3-11e7-a44a-e30923ac9336] session completed with the following 
error
java.io.IOException: Endpoint /10.185.43.15 died
at 
org.apache.cassandra.repair.RepairSession.convict(RepairSession.java:333) 
~[apache-cassandra-3.0.11.jar:3.0.11]
at 
org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) 
[apache-cassandra-3.0.11.jar:3.0.11]
at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:766) 
[apache-cassandra-3.0.11.jar:3.0.11]
at org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:66) 
[apache-cassandra-3.0.11.jar:3.0.11]
at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:181) 
[apache-cassandra-3.0.11.jar:3.0.11]
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
 [apache-cassandra-3.0.11.jar:3.0.11]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_121]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_121]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_121]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_121]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
 [apache-cassandra-3.0.11.jar:3.0.11]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
{code}

2. At this moment the repair coordinator hasn't received the response 
(MerkleTrees) for the node that was marked down. This means, RepairJob#run will 
never return because it waits for validations to finish:

{code}
// Wait for validation to complete
Futures.getUnchecked(validations);
{code}

Be noted that all RepairJob's (as Runnable) run on a shared executor created in 
RepairRunnable#runMayThrow, while all snapshot, validation and sync'ing happen 
on a per-RepairSession "taskExecutor". The RepairJob#run will only return when 
it receives MerkleTrees (or null) from all endpoints for a given column family 
and token range.

As evidence of the thread leak, below is from the thread dump. I can also get 
the same stack trace when simulating the same issue in dev environment.

{code}
"Repair#129:56" #406373 daemon prio=5 os_prio=0 tid=0x7fc495028400 
nid=0x1a77d waiting on condition [0x7fc02153]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0002d7c00198> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(Named

[jira] [Comment Edited] (CASSANDRA-13049) Too many open files during bootstrapping

2017-05-12 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810
 ] 

Simon Zhou edited comment on CASSANDRA-13049 at 5/12/17 10:19 PM:
--

I wrote some [micro benchmark code | 
https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
 To my surprise memory mapping is very efficient even on small files. Here is 
the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), 
there are 2000 files. Notes:
1. I should have disabled page cache but this result is for the first time I 
ran the test on that server. 
2. The buffer size is 64KB. I tried 4KB and it shows similar result but 4KB is 
not an optimal buffer size.

Having said that, we can stick with mmap for efficient IO while seeking for 
configuration tuning to reduce the number of sstables being streamed.

Benchmark (bufferSize) (filePath)  (useDirectBuffer)  
Mode  Cnt   Score   Error  Units
MmapPerf.readChannel 65536/home/szhou/1kfiles  false  
avgt4   0.044 ± 0.051   s/op
MmapPerf.readChannel 65536/home/szhou/1kfiles   true  
avgt4   0.064 ± 0.015   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles  false  
avgt4   0.050 ± 0.060   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles   true  
avgt4   0.072 ± 0.019   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles  false  
avgt4   0.143 ± 0.060   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles   true  
avgt4   0.166 ± 0.021   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles  false  
avgt4   1.051 ± 0.801   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles   true  
avgt4   1.287 ± 0.220   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles  false  
avgt4   9.696 ± 2.207   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles   true  
avgt4  13.754 ± 1.379   s/op
MmapPerf.readMapping 65536/home/szhou/1kfiles  false  
avgt4   0.017 ± 0.007   s/op
MmapPerf.readMapping 65536/home/szhou/1kfiles   true  
avgt4   0.017 ± 0.005   s/op
MmapPerf.readMapping 65536   /home/szhou/10kfiles  false  
avgt4   0.016 ± 0.004   s/op
MmapPerf.readMapping 65536   /home/szhou/10kfiles   true  
avgt4   0.017 ± 0.006   s/op
MmapPerf.readMapping 65536  /home/szhou/100kfiles  false  
avgt4   0.023 ± 0.004   s/op
MmapPerf.readMapping 65536  /home/szhou/100kfiles   true  
avgt4   0.026 ± 0.006   s/op
MmapPerf.readMapping 65536/home/szhou/1mfiles  false  
avgt4   0.129 ± 0.017   s/op
MmapPerf.readMapping 65536/home/szhou/1mfiles   true  
avgt4   0.132 ± 0.068   s/op
MmapPerf.readMapping 65536   /home/szhou/10mfiles  false  
avgt4   1.313 ± 0.262   s/op
MmapPerf.readMapping 65536   /home/szhou/10mfiles   true  
avgt4   1.274 ± 0.482   s/op


was (Author: szhou):
I wrote some [micro benchmark code | 
https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
 To my surprise memory mapping is very efficient even on small files. Here is 
the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), 
there are 2000 files. I should have disabled page cache but this result is for 
the first time I ran the test on that server. Having said that, we can stick 
with mmap for efficient IO while seeking for configuration tuning to reduce the 
number of sstables being streamed.

Benchmark (bufferSize) (filePath)  (useDirectBuffer)  
Mode  Cnt   Score   Error  Units
MmapPerf.readChannel 65536/home/szhou/1kfiles  false  
avgt4   0.044 ± 0.051   s/op
MmapPerf.readChannel 65536/home/szhou/1kfiles   true  
avgt4   0.064 ± 0.015   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles  false  
avgt4   0.050 ± 0.060   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles   true  
avgt4   0.072 ± 0.019   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles  false  
avgt4   0.143 ± 0.060   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles   true  
avgt4   0.166 ± 0.021   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles  false  
avgt4   1.051 ± 0.801   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles   true  
avgt4   1.287 ± 0.220   s/op
MmapPerf.readChannel 65536   /home/szhou/10mf

[jira] [Comment Edited] (CASSANDRA-13049) Too many open files during bootstrapping

2017-05-12 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810
 ] 

Simon Zhou edited comment on CASSANDRA-13049 at 5/12/17 10:17 PM:
--

I wrote some [micro benchmark code | 
https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
 To my surprise memory mapping is very efficient even on small files. Here is 
the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), 
there are 2000 files. I should have disabled page cache but this result is for 
the first time I ran the test on that server. Having said that, we can stick 
with mmap for efficient IO while seeking for configuration tuning to reduce the 
number of sstables being streamed.

Benchmark (bufferSize) (filePath)  (useDirectBuffer)  
Mode  Cnt   Score   Error  Units
MmapPerf.readChannel 65536/home/szhou/1kfiles  false  
avgt4   0.044 ± 0.051   s/op
MmapPerf.readChannel 65536/home/szhou/1kfiles   true  
avgt4   0.064 ± 0.015   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles  false  
avgt4   0.050 ± 0.060   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles   true  
avgt4   0.072 ± 0.019   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles  false  
avgt4   0.143 ± 0.060   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles   true  
avgt4   0.166 ± 0.021   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles  false  
avgt4   1.051 ± 0.801   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles   true  
avgt4   1.287 ± 0.220   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles  false  
avgt4   9.696 ± 2.207   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles   true  
avgt4  13.754 ± 1.379   s/op
MmapPerf.readMapping 65536/home/szhou/1kfiles  false  
avgt4   0.017 ± 0.007   s/op
MmapPerf.readMapping 65536/home/szhou/1kfiles   true  
avgt4   0.017 ± 0.005   s/op
MmapPerf.readMapping 65536   /home/szhou/10kfiles  false  
avgt4   0.016 ± 0.004   s/op
MmapPerf.readMapping 65536   /home/szhou/10kfiles   true  
avgt4   0.017 ± 0.006   s/op
MmapPerf.readMapping 65536  /home/szhou/100kfiles  false  
avgt4   0.023 ± 0.004   s/op
MmapPerf.readMapping 65536  /home/szhou/100kfiles   true  
avgt4   0.026 ± 0.006   s/op
MmapPerf.readMapping 65536/home/szhou/1mfiles  false  
avgt4   0.129 ± 0.017   s/op
MmapPerf.readMapping 65536/home/szhou/1mfiles   true  
avgt4   0.132 ± 0.068   s/op
MmapPerf.readMapping 65536   /home/szhou/10mfiles  false  
avgt4   1.313 ± 0.262   s/op
MmapPerf.readMapping 65536   /home/szhou/10mfiles   true  
avgt4   1.274 ± 0.482   s/op


was (Author: szhou):
I wrote some micro [benchmark code | 
https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
 To my surprise memory mapping is very efficient even on small files. Here is 
the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), 
there are 2000 files. I should have disabled page cache but this result is for 
the first time I ran the test on that server. Having said that, we can stick 
with mmap for efficient IO while seeking for configuration tuning to reduce the 
number of sstables being streamed.

Benchmark (bufferSize) (filePath)  (useDirectBuffer)  
Mode  Cnt   Score   Error  Units
MmapPerf.readChannel 65536/home/szhou/1kfiles  false  
avgt4   0.044 ± 0.051   s/op
MmapPerf.readChannel 65536/home/szhou/1kfiles   true  
avgt4   0.064 ± 0.015   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles  false  
avgt4   0.050 ± 0.060   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles   true  
avgt4   0.072 ± 0.019   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles  false  
avgt4   0.143 ± 0.060   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles   true  
avgt4   0.166 ± 0.021   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles  false  
avgt4   1.051 ± 0.801   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles   true  
avgt4   1.287 ± 0.220   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles  false  
avgt4   9.696 ± 2.207   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles   

[jira] [Commented] (CASSANDRA-13049) Too many open files during bootstrapping

2017-05-12 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810
 ] 

Simon Zhou commented on CASSANDRA-13049:


I wrote some micro [benchmark code | 
https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
 To my surprise memory mapping is very efficient even on small files. Here is 
the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), 
there are 2000 files. I should have disabled page cache but this result is for 
the first time I ran the test on that server. Having said that, we can stick 
with mmap for efficient IO while seeking for configuration tuning to reduce the 
number of sstables being streamed.

Benchmark (bufferSize) (filePath)  (useDirectBuffer)  
Mode  Cnt   Score   Error  Units
MmapPerf.readChannel 65536/home/szhou/1kfiles  false  
avgt4   0.044 ± 0.051   s/op
MmapPerf.readChannel 65536/home/szhou/1kfiles   true  
avgt4   0.064 ± 0.015   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles  false  
avgt4   0.050 ± 0.060   s/op
MmapPerf.readChannel 65536   /home/szhou/10kfiles   true  
avgt4   0.072 ± 0.019   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles  false  
avgt4   0.143 ± 0.060   s/op
MmapPerf.readChannel 65536  /home/szhou/100kfiles   true  
avgt4   0.166 ± 0.021   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles  false  
avgt4   1.051 ± 0.801   s/op
MmapPerf.readChannel 65536/home/szhou/1mfiles   true  
avgt4   1.287 ± 0.220   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles  false  
avgt4   9.696 ± 2.207   s/op
MmapPerf.readChannel 65536   /home/szhou/10mfiles   true  
avgt4  13.754 ± 1.379   s/op
MmapPerf.readMapping 65536/home/szhou/1kfiles  false  
avgt4   0.017 ± 0.007   s/op
MmapPerf.readMapping 65536/home/szhou/1kfiles   true  
avgt4   0.017 ± 0.005   s/op
MmapPerf.readMapping 65536   /home/szhou/10kfiles  false  
avgt4   0.016 ± 0.004   s/op
MmapPerf.readMapping 65536   /home/szhou/10kfiles   true  
avgt4   0.017 ± 0.006   s/op
MmapPerf.readMapping 65536  /home/szhou/100kfiles  false  
avgt4   0.023 ± 0.004   s/op
MmapPerf.readMapping 65536  /home/szhou/100kfiles   true  
avgt4   0.026 ± 0.006   s/op
MmapPerf.readMapping 65536/home/szhou/1mfiles  false  
avgt4   0.129 ± 0.017   s/op
MmapPerf.readMapping 65536/home/szhou/1mfiles   true  
avgt4   0.132 ± 0.068   s/op
MmapPerf.readMapping 65536   /home/szhou/10mfiles  false  
avgt4   1.313 ± 0.262   s/op
MmapPerf.readMapping 65536   /home/szhou/10mfiles   true  
avgt4   1.274 ± 0.482   s/op

> Too many open files during bootstrapping
> 
>
> Key: CASSANDRA-13049
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13049
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>
> We just upgraded from 2.2.5 to 3.0.10 and got issue during bootstrapping. So 
> likely this is something made worse along with improving IO performance in 
> Cassandra 3.
> On our side, the issue is that we have lots of small sstables and thus when 
> bootstrapping a new node, it receives lots of files during streaming and 
> Cassandra keeps all of them open for an unpredictable amount of time. 
> Eventually we hit "Too many open files" error and around that time, I can see 
> ~1M open files through lsof and almost all of them are *-Data.db and 
> *-Index.db. Definitely we should use a better compaction strategy to reduce 
> the number of sstables but I see a few possible improvements in Cassandra:
> 1. We use memory map when reading data from sstables. Every time we create a 
> new memory map, there is one more file descriptor open. Memory map improves 
> IO performance when dealing with large files, do we want to set a file size 
> threshold when doing this?
> 2. Whenever we finished receiving a file from peer, we create a 
> SSTableReader/BigTableReader, which includes opening the data file and index 
> file, and keep them open until some time later (unpredictable). See 
> StreamReceiveTask#L110, BigTableWriter#openFinal and 
> SSTableReader#InstanceTidier. Is it better to lazily open the data/index 
> files or close them more often to reclaim the file descriptors?
> I searched all known issue in JIRA and looks like this is a new issue in 
> Cassa

[jira] [Created] (CASSANDRA-13491) Emit metrics for JVM safepoint pause

2017-05-03 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13491:
--

 Summary: Emit metrics for JVM safepoint pause
 Key: CASSANDRA-13491
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13491
 Project: Cassandra
  Issue Type: New Feature
Reporter: Simon Zhou


GC pause is not the only source of latency from JVM. In one of our recent 
production issues, the metrics for GC looks good (some >200ms and longest 
500ms) but GC logs show periodic pauses like this:
{code}
2017-04-26T01:51:29.420+: 352535.998: Total time for which application 
threads were stopped: 19.8835870 seconds, Stopping threads took: 19.7842073 
seconds
{code}

This huge delay should be JVM malfunction but it caused some requests timeout. 
So I'm suggesting to add support for safepoint pause for better observability. 
Two problems though:
1. This depends on JVM. Some JVMs may not expose these internal MBeans. This is 
actually the same case for existing GCInspector.
2. For Hotspot, it has HotspotRuntime as an internal MBean so that we can get 
safepoint pause. However, there is no notification support for that. I got 
error "MBean sun.management:type=HotspotRuntime does not implement 
javax.management.NotificationBroadcaster" when trying to register a listener. 
This means we will need to pull the safepoint pauses from HotspotRuntime 
periodically.

Reference:
http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html

Anyone think we should support this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13387) Metrics for repair

2017-04-26 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985756#comment-15985756
 ] 

Simon Zhou commented on CASSANDRA-13387:


Thanks for the comments. As [~bdeggleston] suggested, I'll probably create a 
ticket after investigation. Here are the updated patches:
|3.0.x |[patch | 
https://github.com/szhou1234/cassandra/commit/56c9950b29d233e71bb6a5e2e1d1f3f714c3d723]|
|3.11 |[patch | 
https://github.com/szhou1234/cassandra/commit/a136176e9798ceaa7efb6b062acf60f51786f4d1]|
|4.0 |[patch | 
https://github.com/szhou1234/cassandra/commit/54803838709308f364d4abd50ba995cd9caa61f4]|

Be noted that 4.0 patch is slightly different from 3.0 due to merge conflict.

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair

2017-04-25 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983837#comment-15983837
 ] 

Simon Zhou commented on CASSANDRA-13397:


[~pauloricardomg], in case you haven't done so, are you going to merge the fix 
to trunk?

> Return value of CountDownLatch.await() not being checked in Repair
> --
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: CASSANDRA-13397-v1.patch
>
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13387) Metrics for repair

2017-04-25 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13387:
---
Status: Patch Available  (was: Open)

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13387) Metrics for repair

2017-04-25 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983831#comment-15983831
 ] 

Simon Zhou commented on CASSANDRA-13387:


Patch for 3.0.x is [ here | 
https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc].
 Exception metrics are emitted on keyspace level (RepairRunnable). We could 
emit them on a finer granularity but that means more exceptions, especially for 
primary range repair. For monitoring purpose, I think keyspace level metrics 
are enough but let me know if you have different opinion. Once initial review 
passes, I'll work on a patch for trunk.

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13387) Metrics for repair

2017-04-25 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983831#comment-15983831
 ] 

Simon Zhou edited comment on CASSANDRA-13387 at 4/25/17 11:33 PM:
--

Patch for 3.0.x is [here | 
https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc].
 Exception metrics are emitted on keyspace level (RepairRunnable). We could 
emit them on a finer granularity but that means more exceptions, especially for 
primary range repair. For monitoring purpose, I think keyspace level metrics 
are enough but let me know if you have different opinion. Once initial review 
passes, I'll work on a patch for trunk.


was (Author: szhou):
Patch for 3.0.x is [ here | 
https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc].
 Exception metrics are emitted on keyspace level (RepairRunnable). We could 
emit them on a finer granularity but that means more exceptions, especially for 
primary range repair. For monitoring purpose, I think keyspace level metrics 
are enough but let me know if you have different opinion. Once initial review 
passes, I'll work on a patch for trunk.

> Metrics for repair
> --
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed 
> now, the exception will be caught by UncaughtExceptionHandler set in 
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id = 
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-25 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983423#comment-15983423
 ] 

Simon Zhou commented on CASSANDRA-13467:


Thanks [~slebresne]. If there is workaround for both behaviors below at the 
same time, I'd not need this backport:
1. Cassandra shouldn't log warning or throw exception for batched statements on 
the same partition.
2. Cassandra should still log warning or throw exception for batched statements 
on different partitions.

If you just bump the thresholds in cassandra.yaml, you will be unable to detect 
the problem in #2 above. Do I misunderstand?

> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979516#comment-15979516
 ] 

Simon Zhou commented on CASSANDRA-13467:


[~slebresne] do you want to take a look at this backport? The code is from your 
original commit in 3.6.

> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356
 ] 

Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 10:06 PM:
--

This is the same patch from [3.6 | 
https://github.com/pcmanus/cassandra/commit/284eb4f49fade13f8dfcec9ff0f33aa19963c788]
 with slight change:

| 3.0 | 
[patch|https://github.com/szhou1234/cassandra/commit/2c61388e3032a18e32adbfdc30ab92908aef]



was (Author: szhou):
This is the same patch from [3.6 | 
https://github.com/pcmanus/cassandra/commit/284eb4f49fade13f8dfcec9ff0f33aa19963c788]
 with slight change:

| 3.0 | 
[patch|https://github.com/szhou1234/cassandra/commit/b5783abc294564cb1d248f5ceee62a2924113060]


> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13467:
---
Status: Patch Available  (was: Open)

> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356
 ] 

Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 10:03 PM:
--

This is the same patch from [3.6 | 
https://github.com/pcmanus/cassandra/commit/284eb4f49fade13f8dfcec9ff0f33aa19963c788]
 with slight change:

| 3.0 | 
[patch|https://github.com/szhou1234/cassandra/commit/b5783abc294564cb1d248f5ceee62a2924113060]



was (Author: szhou):
This is the same patch from CASSANDRA-10876:

| 3.0 | 
[patch|https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269]


> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356
 ] 

Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 8:57 PM:
-

This is the same patch from CASSANDRA-10876:

| 3.0 | 
[patch|https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269]



was (Author: szhou):
This is the same patch from CASSANDRA-10876:

| 3.0 | patch 
[https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269
 ] |

> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356
 ] 

Simon Zhou edited comment on CASSANDRA-13467 at 4/21/17 8:55 PM:
-

This is the same patch from CASSANDRA-10876:

| 3.0 | patch 
[https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269
 ] |


was (Author: szhou):
This is the same patch from CASSANDRA-10876:

| 3.0 | [ patch | 
https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269
 ] |

> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979356#comment-15979356
 ] 

Simon Zhou commented on CASSANDRA-13467:


This is the same patch from CASSANDRA-10876:

| 3.0 | [ patch | 
https://github.com/szhou1234/cassandra/commit/1718d6ef950cf0d0bc98aea68297937362a5f269
 ] |

> [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single 
> partition batches
> -
>
> Key: CASSANDRA-13467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
>
> Would anyone think this backport may cause problem? We're running Cassandra 
> 3.0 and hit this problem. There are some other people would like this 
> backport (see the last few comments from CASSANDRA-10876).
> I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13467) [Backport CASSANDRA-10876]: Alter behavior of batch WARN and fail on single partition batches

2017-04-21 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13467:
--

 Summary: [Backport CASSANDRA-10876]: Alter behavior of batch WARN 
and fail on single partition batches
 Key: CASSANDRA-13467
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13467
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Simon Zhou
Assignee: Simon Zhou
Priority: Minor
 Fix For: 3.0.x


Would anyone think this backport may cause problem? We're running Cassandra 3.0 
and hit this problem. There are some other people would like this backport (see 
the last few comments from CASSANDRA-10876).

I'll provide the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair

2017-04-20 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977929#comment-15977929
 ] 

Simon Zhou commented on CASSANDRA-13397:


Thank you [~pauloricardomg]!

> Return value of CountDownLatch.await() not being checked in Repair
> --
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: CASSANDRA-13397-v1.patch
>
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair

2017-03-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13397:
---
 Priority: Minor  (was: Major)
Fix Version/s: 3.0.13

> Return value of CountDownLatch.await() not being checked in Repair
> --
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Fix For: 3.0.13
>
> Attachments: CASSANDRA-13397-v1.patch
>
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair

2017-03-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13397:
---
Attachment: CASSANDRA-13397-v1.patch

The attached patch includes the fix and a minor improvement (bail out early if 
there is any unavailable neighbor). [~krummas] could you help review this patch?

> Return value of CountDownLatch.await() not being checked in Repair
> --
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Attachments: CASSANDRA-13397-v1.patch
>
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair

2017-03-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13397:
---
Status: Patch Available  (was: Open)

> Return value of CountDownLatch.await() not being checked in Repair
> --
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Attachments: CASSANDRA-13397-v1.patch
>
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked

2017-03-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13397:
---
Description: 
While looking into repair code, I realize that we should check return value of 
CountDownLatch.await(). Most of the places that we don't check the return 
value, nothing bad would happen due to other protection. However, 
ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
{code}
public static void testLatch() throws InterruptedException {
CountDownLatch latch = new CountDownLatch(2);
latch.countDown();

new Thread(() -> {
try {
Thread.sleep(1200);
} catch (InterruptedException e) {
System.err.println("interrupted");
}
latch.countDown();
System.out.println("counted down");
}).start();


latch.await(1, TimeUnit.SECONDS);
if (latch.getCount() > 0) {
System.err.println("failed");
} else {
System.out.println("success");
}
}
{code}

  was:
While looking into repair code, I realize that we should check return value of 
CountDownLatch.await(). However, there are some places we don't check and some 
of them may cause bad consequent behavior, like in 
ActiveRepairService#prepareForRepair and StorageProxy#describeSchemaVersions. I 
haven't checked the original version that has this bug but at least 
StorageProxy#describeSchemaVersions has the bug starting from 2010. Code to 
reproduce:
{code}
public static void testLatch() throws InterruptedException {
CountDownLatch latch = new CountDownLatch(2);
latch.countDown();

new Thread(() -> {
try {
Thread.sleep(1200);
} catch (InterruptedException e) {
System.err.println("interrupted");
}
latch.countDown();
System.out.println("counted down");
}).start();


latch.await(1, TimeUnit.SECONDS);
if (latch.getCount() > 0) {
System.err.println("failed");
} else {
System.out.println("success");
}
}
{code}


> Return value of CountDownLatch.await() not being checked
> 
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair

2017-03-31 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13397:
---
Summary: Return value of CountDownLatch.await() not being checked in Repair 
 (was: Return value of CountDownLatch.await() not being checked)

> Return value of CountDownLatch.await() not being checked in Repair
> --
>
> Key: CASSANDRA-13397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>
> While looking into repair code, I realize that we should check return value 
> of CountDownLatch.await(). Most of the places that we don't check the return 
> value, nothing bad would happen due to other protection. However, 
> ActiveRepairService#prepareForRepair should have the check. Code to reproduce:
> {code}
> public static void testLatch() throws InterruptedException {
> CountDownLatch latch = new CountDownLatch(2);
> latch.countDown();
> new Thread(() -> {
> try {
> Thread.sleep(1200);
> } catch (InterruptedException e) {
> System.err.println("interrupted");
> }
> latch.countDown();
> System.out.println("counted down");
> }).start();
> latch.await(1, TimeUnit.SECONDS);
> if (latch.getCount() > 0) {
> System.err.println("failed");
> } else {
> System.out.println("success");
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked

2017-03-31 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13397:
--

 Summary: Return value of CountDownLatch.await() not being checked
 Key: CASSANDRA-13397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13397
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou


While looking into repair code, I realize that we should check return value of 
CountDownLatch.await(). However, there are some places we don't check and some 
of them may cause bad consequent behavior, like in 
ActiveRepairService#prepareForRepair and StorageProxy#describeSchemaVersions. I 
haven't checked the original version that has this bug but at least 
StorageProxy#describeSchemaVersions has the bug starting from 2010. Code to 
reproduce:
{code}
public static void testLatch() throws InterruptedException {
CountDownLatch latch = new CountDownLatch(2);
latch.countDown();

new Thread(() -> {
try {
Thread.sleep(1200);
} catch (InterruptedException e) {
System.err.println("interrupted");
}
latch.countDown();
System.out.println("counted down");
}).start();


latch.await(1, TimeUnit.SECONDS);
if (latch.getCount() > 0) {
System.err.println("failed");
} else {
System.out.println("success");
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13387) Metrics for repair

2017-03-28 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13387:
--

 Summary: Metrics for repair
 Key: CASSANDRA-13387
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
 Project: Cassandra
  Issue Type: Improvement
Reporter: Simon Zhou
Assignee: Simon Zhou
Priority: Minor


We're missing metrics for repair, especially for errors. From what I observed 
now, the exception will be caught by UncaughtExceptionHandler set in 
CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one 
example:

{code}
ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 - 
Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: Parent repair session with id = 
8c85d260-1319-11e7-82a2-25090a89015f has failed.
at 
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-3.0.10.jar:3.0.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13358) AlterViewStatement.checkAccess can throw exceptions

2017-03-26 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942480#comment-15942480
 ] 

Simon Zhou commented on CASSANDRA-13358:


I didn't see that you have the "if" check as you intended to. Also the coding 
style in Cassandra is that you have a new line for "catch **".

> AlterViewStatement.checkAccess can throw exceptions
> ---
>
> Key: CASSANDRA-13358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13358
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Hao Zhong
>Assignee: Hao Zhong
> Attachments: cassandra.patch
>
>
> The AlterViewStatement.checkAccess method has code lines as follow:
> {code:title=AlterViewStatement.java|borderStyle=solid}
>   if (baseTable != null)
> state.hasColumnFamilyAccess(keyspace(), baseTable.name, 
> Permission.ALTER);
> {code}
> These lines can throw InvalidRequestException. Indeed, 
> DropTableStatement.checkAccess has a similar problem, and was fixed in 
> CASSANDRA-6687. The fixed code is as follow:
> {code:title=DropTableStatement.java|borderStyle=solid}
>  try
> {
> state.hasColumnFamilyAccess(keyspace(), columnFamily(), 
> Permission.DROP);
> }
> catch (InvalidRequestException e)
> {
> if (!ifExists)
> throw e;
> }
> {code}
> Please fix the problem as CASSANDRA-6687 did.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13343) Wrong class name for LoggerFactory.getLogger

2017-03-16 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13343:
---
Status: Patch Available  (was: Open)

[~pauloricardomg] do you mind taking a quick review? Just one line change.

> Wrong class name for LoggerFactory.getLogger
> 
>
> Key: CASSANDRA-13343
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13343
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Trivial
> Fix For: 3.0.13
>
> Attachments: CASSANDRA-13343-v1.patch
>
>
> We have the below code in AnticompactionTask.java. The parameter is wrong.
> {code}
> private static Logger logger = LoggerFactory.getLogger(RepairSession.class);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13343) Wrong class name for LoggerFactory.getLogger

2017-03-16 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13343:
---
Attachment: CASSANDRA-13343-v1.patch

> Wrong class name for LoggerFactory.getLogger
> 
>
> Key: CASSANDRA-13343
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13343
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Trivial
> Fix For: 3.0.13
>
> Attachments: CASSANDRA-13343-v1.patch
>
>
> We have the below code in AnticompactionTask.java. The parameter is wrong.
> {code}
> private static Logger logger = LoggerFactory.getLogger(RepairSession.class);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13343) Wrong class name for LoggerFactory.getLogger

2017-03-16 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13343:
--

 Summary: Wrong class name for LoggerFactory.getLogger
 Key: CASSANDRA-13343
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13343
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou
Priority: Trivial
 Fix For: 3.0.13


We have the below code in AnticompactionTask.java. The parameter is wrong.
{code}
private static Logger logger = LoggerFactory.getLogger(RepairSession.class);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

2017-03-13 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923619#comment-15923619
 ] 

Simon Zhou commented on CASSANDRA-13323:


Thanks [~slebresne] for the comment. For hinted handoff of a dropped table, the 
UnknownColumnFamilyException has been handled in 
HintMessage#Serializer#deserialize. Even though a HintMessage will still be 
returned, its internal data (Hint) is null and thus will be ignored in 
HintVerbHanlder. So UnknownColumnFamilyException just causes some overhead 
(deserialization, etc.) on the receiver side of hinted handoff. At this moment 
I tend to say hinted handoff is unrelated to IncomingTcpConnection being closed 
but I'll double check.

The stack trace I posted in this ticket is actually for paxos commit. 
Unfortunately CommitSerializer doesn't take the message size into 
consideration. So I cannot just catch UnknownColumnFamilyException and skip 
some bytes from DataOutputPlus. To fix that, we will have to update the 
protocol a bit (maybe introduce MessagingService.VERSION_3xx). Do you think it 
worths the effort? I've lost the original logs so I cannot confirm the scope of 
this issue. One of the cons of binary protocol is that it's hard to maintain 
backward compatibility.

> IncomingTcpConnection closed due to one bad message
> ---
>
> Key: CASSANDRA-13323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.13
>
> Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
> IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 
> OutboundTcpConnection.java:515 - Handshaking version with /
> {code}
> The reason is that the node was receiving hinted data for a dropped table. 
> This may happen with other messages as well. On Cassandra side, 
> IncomingTcpConnection shouldn't close on just one bad message, even though it 
> will be restarted soon later by SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

2017-03-10 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13323:
---
Description: 
We got this exception:
{code}
WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is 
likely due to the schema not being fully propagated.  Please wait for schema 
agreement on table creation.
at 
org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
{code}

Also we saw this log in another host indicating it needs to re-connect:
{code}
INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 
- Handshaking version with /
{code}

The reason is that the node was receiving hinted data for a dropped table. This 
may happen with other messages as well. On Cassandra side, 
IncomingTcpConnection shouldn't close on just one bad message, even though it 
will be restarted soon later by SocketThread in MessagingService.

  was:
We got this exception:
{code}
WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is 
likely due to the schema not being fully propagated.  Please wait for schema 
agreement on table creation.
at 
org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
{code}

Also we saw this log in another host indicating it needs to re-connect:
{code}
INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 
- Handshaking version with /
{code}

The reason is that another node was sending hinted data to this node. However 
the hinted data was for a table that had been dropped. This may happen with 
other messages as well. On Cassandra side, IncomingTcpConnection shouldn't 
close on just one bad message, even though it will be restarted soon later by 
SocketThread in MessagingService.


> IncomingTcpConnection closed due to one bad message
> ---
>
> Key: CASSANDRA-13323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>  

[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

2017-03-10 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13323:
---
Status: Patch Available  (was: Open)

> IncomingTcpConnection closed due to one bad message
> ---
>
> Key: CASSANDRA-13323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.13
>
> Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
> IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 
> OutboundTcpConnection.java:515 - Handshaking version with /
> {code}
> The reason is that another node was sending hinted data to this node. However 
> the hinted data was for a table that had been dropped. This may happen with 
> other messages as well. On Cassandra side, IncomingTcpConnection shouldn't 
> close on just one bad message, even though it will be restarted soon later by 
> SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

2017-03-10 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13323:
---
Attachment: CASSANDRA-13323-v1.patch

> IncomingTcpConnection closed due to one bad message
> ---
>
> Key: CASSANDRA-13323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.13
>
> Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
> IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 
> OutboundTcpConnection.java:515 - Handshaking version with /
> {code}
> The reason is that another node was sending hinted data to this node. However 
> the hinted data was for a table that had been dropped. This may happen with 
> other messages as well. On Cassandra side, IncomingTcpConnection shouldn't 
> close on just one bad message, even though it will be restarted soon later by 
> SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

2017-03-10 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13323:
--

 Summary: IncomingTcpConnection closed due to one bad message
 Key: CASSANDRA-13323
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
 Project: Cassandra
  Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou
 Fix For: 3.0.13


We got this exception:
{code}
WARN  [MessagingService-Incoming-/] 2017-02-14 17:33:33,177 
IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is 
likely due to the schema not being fully propagated.  Please wait for schema 
agreement on table creation.
at 
org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
 ~[apache-cassandra-3.0.10.jar:3.0.10]
{code}

Also we saw this log in another host indicating it needs to re-connect:
{code}
INFO  [HANDSHAKE-/] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 
- Handshaking version with /
{code}

The reason is that another node was sending hinted data to this node. However 
the hinted data was for a table that had been dropped. This may happen with 
other messages as well. On Cassandra side, IncomingTcpConnection shouldn't 
close on just one bad message, even though it will be restarted soon later by 
SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded

2017-02-23 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13261:
---
Status: Patch Available  (was: Open)

> Improve speculative retry to avoid being overloaded
> ---
>
> Key: CASSANDRA-13261
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13261
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Attachments: CASSANDRA-13261-v1.patch
>
>
> In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as 
> an improvement.
> This is to avoid Cassandra being overloaded when using CUSTOM speculative 
> retry parameter. Steps to reason/repro this with 3.0.10:
> 1. Use custom speculative retry threshold like this:
> cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms';
> 2. SpeculatingReadExecutor will be used, according to this piece of code in 
> AbstractReadExecutor:
> {code}
> if (retry.equals(SpeculativeRetryParam.ALWAYS))
> return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, 
> consistencyLevel, targetReplicas);
> else // PERCENTILE or CUSTOM.
> return new SpeculatingReadExecutor(keyspace, cfs, command, 
> consistencyLevel, targetReplicas);
> {code}
> 3. When RF=3 and LOCAL_QUORUM is used, the below code (from 
> SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect 
> Cassandra from being overloaded, even though the inline comment suggests such 
> intention:
> {code}
> // no latency information, or we're overloaded
> if (cfs.sampleLatencyNanos > 
> TimeUnit.MILLISECONDS.toNanos(command.getTimeout()))
> return;
> {code}
> The reason is that cfs.sampleLatencyNanos is assigned as 
> retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of 
> ColumnFamilyStore. However pretty often the timeout is the default one 5000ms.
> As the name suggests, sampleLatencyNanos should be used to keep sampled 
> latency, not something configured "statically". My proposal:
> a. Introduce option -Dcassandra.overload.threshold to allow customizing 
> overload threshold. The default threshold would be 
> DatabaseDescriptor.getRangeRpcTimeout().
> b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload 
> detection, we just compare cfs.sampleLatencyNanos with the customizable 
> threshold above.
> c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time 
> before retry (see line 282 of AbstractReadExecutor). This is the value from 
> table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded

2017-02-23 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13261:
---
Attachment: CASSANDRA-13261-v1.patch

I'm not sure what's the next release for 3.0.* and 3.0.11 was just merged to 
trunk. The attached patch is for trunk but I'd like to have this improvement 
included in the next release for 3.0.*.

[~tjake] Maybe you can help review this patch since you have some context from 
CASSANDRA-13009?

> Improve speculative retry to avoid being overloaded
> ---
>
> Key: CASSANDRA-13261
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13261
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Attachments: CASSANDRA-13261-v1.patch
>
>
> In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as 
> an improvement.
> This is to avoid Cassandra being overloaded when using CUSTOM speculative 
> retry parameter. Steps to reason/repro this with 3.0.10:
> 1. Use custom speculative retry threshold like this:
> cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms';
> 2. SpeculatingReadExecutor will be used, according to this piece of code in 
> AbstractReadExecutor:
> {code}
> if (retry.equals(SpeculativeRetryParam.ALWAYS))
> return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, 
> consistencyLevel, targetReplicas);
> else // PERCENTILE or CUSTOM.
> return new SpeculatingReadExecutor(keyspace, cfs, command, 
> consistencyLevel, targetReplicas);
> {code}
> 3. When RF=3 and LOCAL_QUORUM is used, the below code (from 
> SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect 
> Cassandra from being overloaded, even though the inline comment suggests such 
> intention:
> {code}
> // no latency information, or we're overloaded
> if (cfs.sampleLatencyNanos > 
> TimeUnit.MILLISECONDS.toNanos(command.getTimeout()))
> return;
> {code}
> The reason is that cfs.sampleLatencyNanos is assigned as 
> retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of 
> ColumnFamilyStore. However pretty often the timeout is the default one 5000ms.
> As the name suggests, sampleLatencyNanos should be used to keep sampled 
> latency, not something configured "statically". My proposal:
> a. Introduce option -Dcassandra.overload.threshold to allow customizing 
> overload threshold. The default threshold would be 
> DatabaseDescriptor.getRangeRpcTimeout().
> b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload 
> detection, we just compare cfs.sampleLatencyNanos with the customizable 
> threshold above.
> c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time 
> before retry (see line 282 of AbstractReadExecutor). This is the value from 
> table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded

2017-02-23 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13261:
--

 Summary: Improve speculative retry to avoid being overloaded
 Key: CASSANDRA-13261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13261
 Project: Cassandra
  Issue Type: Improvement
Reporter: Simon Zhou
Assignee: Simon Zhou


In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as an 
improvement.

This is to avoid Cassandra being overloaded when using CUSTOM speculative retry 
parameter. Steps to reason/repro this with 3.0.10:
1. Use custom speculative retry threshold like this:
cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms';

2. SpeculatingReadExecutor will be used, according to this piece of code in 
AbstractReadExecutor:
{code}
if (retry.equals(SpeculativeRetryParam.ALWAYS))
return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, 
consistencyLevel, targetReplicas);
else // PERCENTILE or CUSTOM.
return new SpeculatingReadExecutor(keyspace, cfs, command, 
consistencyLevel, targetReplicas);
{code}

3. When RF=3 and LOCAL_QUORUM is used, the below code (from 
SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect 
Cassandra from being overloaded, even though the inline comment suggests such 
intention:

{code}
// no latency information, or we're overloaded
if (cfs.sampleLatencyNanos > 
TimeUnit.MILLISECONDS.toNanos(command.getTimeout()))
return;
{code}

The reason is that cfs.sampleLatencyNanos is assigned as 
retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of 
ColumnFamilyStore. However pretty often the timeout is the default one 5000ms.

As the name suggests, sampleLatencyNanos should be used to keep sampled 
latency, not something configured "statically". My proposal:
a. Introduce option -Dcassandra.overload.threshold to allow customizing 
overload threshold. The default threshold would be 
DatabaseDescriptor.getRangeRpcTimeout().
b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload 
detection, we just compare cfs.sampleLatencyNanos with the customizable 
threshold above.
c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time 
before retry (see line 282 of AbstractReadExecutor). This is the value from 
table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13009) Speculative retry bugs

2017-02-02 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850545#comment-15850545
 ] 

Simon Zhou edited comment on CASSANDRA-13009 at 2/2/17 9:31 PM:


[~tjake], sure. I abstracted out the time unit fix as v2 patch. For the 
improvement, I'll create a separate ticket to track that. Thanks for the review!


was (Author: szhou):
[~tjake], sure. I abstracted out the time unit fix as v2 patch. For the 
improvement, should I create a separate ticket to track that? Not sure about 
the best practice here. Also thanks for the review!

> Speculative retry bugs
> --
>
> Key: CASSANDRA-13009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13009
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.11
>
> Attachments: CASSANDRA-13009-v1.patch, CASSANDRA-13009-v2.patch
>
>
> There are a few issues with speculative retry:
> 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10):
> The left hand side is in nanos, as the name suggests, while the right hand 
> side is in millis.
> {code}
> sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2;
> {code}
> Here coordinatorReadLatency is already in nanos and we shouldn't multiple the 
> value by 1000. This was a regression in 8896a70 when we switch metrics 
> library and the two libraries use different time units.
> {code}
> sampleLatencyNanos = (long) 
> (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold())
>  * 1000d);
> {code}
> 2. Confusing overload protection and retry delay. As the name 
> "sampleLatencyNanos" suggests, it should be used to keep the actually sampled 
> read latency. However, we assign it the retry threshold in the case of 
> CUSTOM. Then we compare the retry threshold with read timeout (defaults to 
> 5000ms). This means, if we use speculative_retry=10ms for the table, we won't 
> be able to avoid being overloaded. We should compare the actual read latency 
> with the read timeout for overload protection. See line 450 of 
> ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java.
> My proposals are:
> a. We use sampled p99 delay and compare it with a customizable threshold 
> (-Dcassandra.overload.threshold) for overload detection.
> b. Introduce another variable retryDelayNanos for waiting time before retry. 
> This is the value from table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13009) Speculative retry bugs

2017-02-02 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13009:
---
Attachment: CASSANDRA-13009-v2.patch

[~tjake], sure. I abstracted out the time unit fix as v2 patch. For the 
improvement, should I create a separate ticket to track that? Not sure about 
the best practice here. Also thanks for the review!

> Speculative retry bugs
> --
>
> Key: CASSANDRA-13009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13009
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.11
>
> Attachments: CASSANDRA-13009-v1.patch, CASSANDRA-13009-v2.patch
>
>
> There are a few issues with speculative retry:
> 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10):
> The left hand side is in nanos, as the name suggests, while the right hand 
> side is in millis.
> {code}
> sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2;
> {code}
> Here coordinatorReadLatency is already in nanos and we shouldn't multiple the 
> value by 1000. This was a regression in 8896a70 when we switch metrics 
> library and the two libraries use different time units.
> {code}
> sampleLatencyNanos = (long) 
> (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold())
>  * 1000d);
> {code}
> 2. Confusing overload protection and retry delay. As the name 
> "sampleLatencyNanos" suggests, it should be used to keep the actually sampled 
> read latency. However, we assign it the retry threshold in the case of 
> CUSTOM. Then we compare the retry threshold with read timeout (defaults to 
> 5000ms). This means, if we use speculative_retry=10ms for the table, we won't 
> be able to avoid being overloaded. We should compare the actual read latency 
> with the read timeout for overload protection. See line 450 of 
> ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java.
> My proposals are:
> a. We use sampled p99 delay and compare it with a customizable threshold 
> (-Dcassandra.overload.threshold) for overload detection.
> b. Introduce another variable retryDelayNanos for waiting time before retry. 
> This is the value from table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13009) Speculative retry bugs

2017-01-20 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832727#comment-15832727
 ] 

Simon Zhou commented on CASSANDRA-13009:


[~tjake], do you mind reviewing this patch? Thanks.

> Speculative retry bugs
> --
>
> Key: CASSANDRA-13009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13009
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.11
>
> Attachments: CASSANDRA-13009-v1.patch
>
>
> There are a few issues with speculative retry:
> 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10):
> The left hand side is in nanos, as the name suggests, while the right hand 
> side is in millis.
> {code}
> sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2;
> {code}
> Here coordinatorReadLatency is already in nanos and we shouldn't multiple the 
> value by 1000. This was a regression in 8896a70 when we switch metrics 
> library and the two libraries use different time units.
> {code}
> sampleLatencyNanos = (long) 
> (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold())
>  * 1000d);
> {code}
> 2. Confusing overload protection and retry delay. As the name 
> "sampleLatencyNanos" suggests, it should be used to keep the actually sampled 
> read latency. However, we assign it the retry threshold in the case of 
> CUSTOM. Then we compare the retry threshold with read timeout (defaults to 
> 5000ms). This means, if we use speculative_retry=10ms for the table, we won't 
> be able to avoid being overloaded. We should compare the actual read latency 
> with the read timeout for overload protection. See line 450 of 
> ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java.
> My proposals are:
> a. We use sampled p99 delay and compare it with a customizable threshold 
> (-Dcassandra.overload.threshold) for overload detection.
> b. Introduce another variable retryDelayNanos for waiting time before retry. 
> This is the value from table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13009) Speculative retry bugs

2017-01-20 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13009:
---
Reviewer:   (was: T Jake Luciani)

> Speculative retry bugs
> --
>
> Key: CASSANDRA-13009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13009
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Simon Zhou
>Assignee: Simon Zhou
> Fix For: 3.0.11
>
> Attachments: CASSANDRA-13009-v1.patch
>
>
> There are a few issues with speculative retry:
> 1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10):
> The left hand side is in nanos, as the name suggests, while the right hand 
> side is in millis.
> {code}
> sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2;
> {code}
> Here coordinatorReadLatency is already in nanos and we shouldn't multiple the 
> value by 1000. This was a regression in 8896a70 when we switch metrics 
> library and the two libraries use different time units.
> {code}
> sampleLatencyNanos = (long) 
> (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold())
>  * 1000d);
> {code}
> 2. Confusing overload protection and retry delay. As the name 
> "sampleLatencyNanos" suggests, it should be used to keep the actually sampled 
> read latency. However, we assign it the retry threshold in the case of 
> CUSTOM. Then we compare the retry threshold with read timeout (defaults to 
> 5000ms). This means, if we use speculative_retry=10ms for the table, we won't 
> be able to avoid being overloaded. We should compare the actual read latency 
> with the read timeout for overload protection. See line 450 of 
> ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java.
> My proposals are:
> a. We use sampled p99 delay and compare it with a customizable threshold 
> (-Dcassandra.overload.threshold) for overload detection.
> b. Introduce another variable retryDelayNanos for waiting time before retry. 
> This is the value from table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-13106) Unnecessary assertion

2017-01-09 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou resolved CASSANDRA-13106.

Resolution: Cannot Reproduce

Hi Benedict,
Sorry for the spam and thanks for the reply. I just thought this was a quick 
fix. Closing this ticket as I lost the original stack trace and thus cannot 
double confirm the CPU utilization issue was caused by 
Predicates.alwaysTrue().

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: CASSANDRA-13106.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13106) Unnecessary assertion

2017-01-05 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803106#comment-15803106
 ] 

Simon Zhou commented on CASSANDRA-13106:


Ah, got your point! Just want to fix the issue by opening this ticket.

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: CASSANDRA-13106.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13106) Unnecessary assertion

2017-01-05 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803059#comment-15803059
 ] 

Simon Zhou commented on CASSANDRA-13106:


Thanks [~dbrosius]. Yes, I do have -ea flag but doesn't that only cause ~5% 
performance degradation?

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: CASSANDRA-13106.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13106) Unnecessary assertion

2017-01-05 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802990#comment-15802990
 ] 

Simon Zhou commented on CASSANDRA-13106:


[~benedict], could you take a look at this one line patch?

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: CASSANDRA-13106.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13106) Unnecessary assertion

2017-01-05 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13106:
---
Attachment: CASSANDRA-13106.patch

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: CASSANDRA-13106.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13106) Unnecessary assertion

2017-01-05 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13106:
---
Attachment: (was: 0001-Remove-unnecessary-assertion.patch)

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: CASSANDRA-13106.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-13106) Unnecessary assert

2017-01-05 Thread Simon Zhou (JIRA)
Simon Zhou created CASSANDRA-13106:
--

 Summary: Unnecessary assert
 Key: CASSANDRA-13106
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
 Project: Cassandra
  Issue Type: Improvement
Reporter: Simon Zhou
Assignee: Simon Zhou
Priority: Minor
 Attachments: 0001-Remove-unnecessary-assertion.patch

We had over 70 thousand sstables and it's slow to bootstrap new node, even 
though the CPU utilization for main thread of Cassandra was nearly 100%. So we 
took a few stack traces and found that the main thread were busy running this 
line in Tracker.java:
{code}
assert Iterables.all(removed, remove);
{code}

Not exactly sure whether this line causes CPU utilization/bootstrapping issue, 
but this line is redundant because the Predict we pass in is 
Predicates.alwaysTrue(), which means the assertion always 
returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13106) Unnecessary assertion

2017-01-05 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-13106:
---
Attachment: 0001-Remove-unnecessary-assertion.patch

> Unnecessary assertion
> -
>
> Key: CASSANDRA-13106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Minor
> Attachments: 0001-Remove-unnecessary-assertion.patch
>
>
> We had over 70 thousand sstables and it's slow to bootstrap new node, even 
> though the CPU utilization for main thread of Cassandra was nearly 100%. So 
> we took a few stack traces and found that the main thread were busy running 
> this line in Tracker.java:
> {code}
> assert Iterables.all(removed, remove);
> {code}
> Not exactly sure whether this line causes CPU utilization/bootstrapping 
> issue, but this line is redundant because the Predict we pass in is 
> Predicates.alwaysTrue(), which means the assertion always 
> returns true. So I propose to remove that line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >