[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002757#comment-15002757 ] Ariel Weisberg commented on CASSANDRA-10485: +1 LGTM > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001310#comment-15001310 ] Paulo Motta commented on CASSANDRA-10485: - bq. If we submit a hint to an endpoint that left, when will the hint be cleaned up and discarded? Is there a race there? The race window is very small, the node needs to be removed exactly between getting the ID from TokenMetadata and the hint being actually written by the HintsManager. If that happens, on 2.1 and 2.2 the hint will expire by ttl after gc_grace_seconds. On 3.0+, a hint file might be created for the removed node, and needs to be removed manually or via nodetool truncatehints. Fixed minor nit and removed {{isMemberJoining}} assertion. Tests were resubmitted and seem OK. ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-ultimate]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-testall/lastCompletedBuild/testReport/]| |[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-dtest/lastCompletedBuild/testReport/]| > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000754#comment-15000754 ] Ariel Weisberg commented on CASSANDRA-10485: [There is an extra ')' in this trace statement|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-ultimate#diff-71f06c193f5b5e270cf8ac695164f43aR2492] Test look good near as I can tell. Why is isMemberJoining better than just using getHostID being null as an indicator of whether the hint should be written? > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000710#comment-15000710 ] Ariel Weisberg commented on CASSANDRA-10485: If we submit a hint to an endpoint that left, when will the hint be cleaned up and discarded? Is there a race there? > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999531#comment-14999531 ] Paulo Motta commented on CASSANDRA-10485: - Thanks for the review. Addressed issues in new (simpler) implementation: * Discard hints for endpoints with null host ids (logging debug message), avoiding races between multiple {{TokenMetadata}} accesses. * Moved improved version of null host id assertion to {{TokenMetadata.getHostId}}: {{assert hostId != null || !isMemberOrJoining(endpoint)}}. * On 2.2-, changed signature of {{HintedHandOffManager.hintFor}} to accept a {{Pair}}, avoiding an extra possibly racy access to {{TokenMetadata}} * Applied similar logic to new hints implementation on 3.0+ ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-ultimate]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-testall/lastCompletedBuild/testReport/]| |[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-dtest/lastCompletedBuild/testReport/]| > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998915#comment-14998915 ] Ariel Weisberg commented on CASSANDRA-10485: How is the isMemberJoining() check atomic with getHostId() [here|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-final#diff-71f06c193f5b5e270cf8ac695164f43aR1015]? It's also doing endpointForHostId a little bit later. I get why this works in 3.0+. 3.0+ only checks once to see if it is going to write the hint and doesn't do the dance resolving stuff from TMD multiple times thus risking seeing an inconsistent version. > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997633#comment-14997633 ] Paulo Motta commented on CASSANDRA-10485: - I just found a bug on the previous version, where a node can be removed from TMD just before setting the new pending ranges, so the problem will persist. After thinking this through with a fresh mind, the solution is rather simple, I think I was over-complicating. Pending ranges are basically composed of "normal" endpoints, moving endpoints and bootstrapping endpoints. Moving endpoints are also "normal" endpoints. So, what we actually want to check before submitting a hint, is if the node is a normal endpoint or bootstrapping endpoint. If the node is neither a normal/moving or bootstrapping endpoint, we don't want to submit hints to it, simple as that. So, I added a new method {{TokenMetadata.isMemberOrJoining}} to check that before submitting a hint, thus avoiding getting a null host id on hint submission. The two reports of this bug on CASSANDRA-6335 and CASSANDRA-10233, are when a node is replaced or when bootstrapping fails. When a node is replaced, it was a "normal" endpoint, but then it was replaced and it was removed from the ring, so we shouldn't submit a hint to it. When a new node is down after a failed bootstrap, it is removed from the ring, so we shouldn't submit a hint to it. Actually, with CASSANDRA-8838, there's a possibility of resuming a failed bootstrap, so we should not remove the bootstrapping node from the ring for a quarantine period, but we should handle this in a separate ticket. Submitted a new branch with the proposed solution. Sorry for the confusion on this. ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-final]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-final]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-final]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-final]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-final-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-final-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-final-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-final-testall/lastCompletedBuild/testReport/]| |[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-final-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-final-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-final-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-final-dtest/lastCompletedBuild/testReport/]| > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994779#comment-14994779 ] Paulo Motta commented on CASSANDRA-10485: - I implemented an alternative approach which is a bit cleaner and more deterministic. The basic idea is to have a new method {{TokenMetadata.isMemberOrPending()}}, and only submit hints to endpoints that are ring members or pending membership, thus, avoiding fetching null host IDs for removed pending endpoints while the new pending ranges are being calculated. In order to support the {{TokenMetadata.isMemberOrPending()}} method, the {{TokenMetadata}} maintains a new {{livePendingEndpoints}} set which is populated every time new pending ranges are set. When endpoints are removed from {{TokenMetadata}} via the {{removeEndpoint}} method, they're also removed from the {{livePendingEndpoints}} set, so {{TokenMetadata.isMemberOrPending()}} returns false if the endpoint is evicted from the ring. Since both {{removeEndpoint}} and {{setPendingRanges}} update this set, they share a write lock. {{TokenMetadata.isMemberOrPending()}} also uses a read lock, similar to other methods {{isMember()}} or {{getHostId()}}. Merging the solution from 2.1 to 2.2/3.0 was a bit tricky because the pending ranges calculation was extracted from the {{PendingRangeCalculatorService}} to {{TokenMetadata}} within a read lock, so I had to separate the actual calculation (within a read lock) to the actual assignment of the {{pendingRanges}} via the {{setPendingRanges}} method, which uses a write lock. On 3.0, the hints submission part is slightly different (even simpler) due to the new hints implementation. It's still not ideal but I guess better than the previous approach. I will add a link from this ticket to CASSANDRA-6061 so we can take this ticket into account when refactoring the {{TokenMetadata}}. Below are the new branches and test results: ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-v3]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-v3]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-v3-testall/lastCompletedBuild/testReport/]| |[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-v3-dtest/lastCompletedBuild/testReport/]| > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cass
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990471#comment-14990471 ] Ariel Weisberg commented on CASSANDRA-10485: Can you update the comment HintedHandoffManager.hintFor so that it says that getEndpointForHostId can return null in regular operation and not just test? Wouldn't want that check to get removed. Another thing that is weird is that we convert from InetAddr to host id and then HintedHandoffManager converts that back into an InetAddr (although maybe not the same one?). I am not sure how big these things are or how often we rebuild them, but constructing a copy of every map as we iterate even when we aren't going to modify it is more work. If it's not too much churn it's fine I just want to make sure it doesn't end up biting us. I looked at the 3.0 code and it looks good. > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988539#comment-14988539 ] Paulo Motta commented on CASSANDRA-10485: - Yeah, definitely {{TokenMetadata}} needs some love. Perhaps this is a good time to provide some input/requirements for a {{TokenMetadata}} rewrite on CASSANDRA-6061. > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988473#comment-14988473 ] Ariel Weisberg commented on CASSANDRA-10485: That fixes this bug, but the design of TMD is still error prone IMO. It presents a materialized view of a state of the world and it would be easiest to reason about if it moved from one consistent state to another from the perspective of readers. I am lacking in knowledge of the ways in which TMD is used so I can't say if everyone uses it safely. We know hints didn't use it safely which makes me wonder how many others also aren't going to handle it being in an inconsistent state. Even if we get it internally consistent if things make decisions based on a stale version those need to be harmless or they need to eventually check and make sure they didn't do something that was invalidated. I'll dig deeper into it tomorrow. If there is someone else who can chime in as to whether this is a valid concern that would be great. > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988450#comment-14988450 ] Paulo Motta commented on CASSANDRA-10485: - bq. HashMultiMap uses good old HashMap internally. So it's not safe to update it and read at the same time. I think they meant that if the map is immutable it is safe to have multiple concurrent readers. Thanks for clarifying this. Updated patch to perform removal of pending endpoint by working on a copy and then replacing the reference atomically. bq. I think you either need to make the entire thing atomic and make TMD COW and propagate the TMD the entire way. Or alternatively optimistic and if the data isn't there just abort dropping the hint instead of asserting that it is an error. You could assert instead that pendingEndpointsFor() doesn't include the missing endpoint. Implemented the optimistic approach of discarding hints to endpoints with null ID and log a debug message (this should happen seldom with immediate removal of the endpoint from pending ranges). Also moved assertion to {{TokenMetadata.getHostId()}}, so we guarantee that a host id can only be null if the endpoint is not part of the ring (or is not bootstrapping). I needed to do an slight adaptation to the patch on 3.0 on {{StorageProxy.submitHint}} due to the new hints implementation. Tests with new implementation available below: ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-testall/lastCompletedBuild/testReport/]| |[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-dtest/lastCompletedBuild/testReport/]| > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985401#comment-14985401 ] Ariel Weisberg commented on CASSANDRA-10485: Having reviewed this further the entire approach of presenting a mutable TMD to readers is a little iffy. pendingRangesFor() is not isolated from removeEndpoint() so remove endpoint is tearing apart the datastructure while readers of TMD are viewing an inconsistent state. So you could call pendingEndpointsFor() and think you are supposed to submit a hint, but when the time comes to submit the hint the endpoint is already removed from the TMD. I think you either need to make the entire thing atomic and make TMD COW and propagate the TMD the entire way. Or alternatively optimistic and if the data isn't there just abort dropping the hint instead of asserting that it is an error. You could assert instead that pendingEndpointsFor() doesn't include the missing endpoint. > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983701#comment-14983701 ] Ariel Weisberg commented on CASSANDRA-10485: HashMultiMap uses good old HashMap internally. So it's not safe to update it and read at the same time. I think they meant that if the map is immutable it is safe to have multiple concurrent readers. I'm still getting situated on how users of TMD work with this state. Will have more later. > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975122#comment-14975122 ] Paulo Motta commented on CASSANDRA-10485: - It seems pending endpoints are removed from the {{TokenMetadata}} before the new pending ranges are calculated by {{StorageService}}: {code:title=StorageService.java|borderStyle=solid} public void onRemove(InetAddress endpoint) { tokenMetadata.removeEndpoint(endpoint); PendingRangeCalculatorService.instance.update(); } {code} So, there's a window where nodes can be > Missing host ID on hinted handoff write > --- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)