[ 
https://issues.apache.org/jira/browse/CASSANDRA-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981114#comment-14981114
 ] 

Paulo Motta commented on CASSANDRA-9912:
----------------------------------------

Created a simple 
[dest|https://github.com/pauloricardomg/cassandra-dtest/blob/7ef7520206d2ebc355f5ae4d0e64dba64481e057/topology_test.py#L32]
 reproducing the issue. Basically when the node is decommissioned his tokens 
are wiped from {{TokenMetadata}} but not from the system keyspace, so 
{{SizeEstimatesRecorder}} tries to fetch the primary ranges of the 
decommissioned local node's token, which are not in the ring anymore.

Simple fix is to not run {{SizeEstimatesRecorder}} when the node is not a 
member of the ring, generalizing the check that was put by CASSANDRA-9034 to 
not run {{SizeEstimatesRecorder}} when the node has never joined the ring. In 
order to guarantee this generalization will not cause a regression of 
CASSANDRA-9034, I also added a 
[dtest|https://github.com/pauloricardomg/cassandra-dtest/blob/7ef7520206d2ebc355f5ae4d0e64dba64481e057/topology_test.py#L17]
 that reproduces that issue.

I also removed some related dead code from {{StorageService}} and 
{{SystemKeyspace}}.

Test results will be available shortly:

||2.1||2.2||3.0||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-9912]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-9912]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-9912]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-9912]|[PR|https://github.com/riptano/cassandra-dtest/pull/637]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-9912-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-9912-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-9912-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9912-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-9912-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-9912-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-9912-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9912-dtest/lastCompletedBuild/testReport/]|

> SizeEstimatesRecorder has assertions after decommission sometimes
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-9912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9912
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Paulo Motta
>             Fix For: 2.1.12
>
>
> Doing some testing with 2.1.8 adding and decommissioning nodes.  Sometimes 
> after decommissioning the following starts being thrown by the 
> SizeEstimatesRecorder.
> {noformat}
> java.lang.AssertionError: -9223372036854775808 not found in 
> -9223372036854775798, 10
> at 
> org.apache.cassandra.locator.TokenMetadata.getPredecessor(TokenMetadata.java:683)
>  ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at 
> org.apache.cassandra.locator.TokenMetadata.getPrimaryRangesFor(TokenMetadata.java:627)
>  ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at 
> org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:68)
>  ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_40]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_40]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_40]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_40]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_40]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_40]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to