[ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988450#comment-14988450 ]
Paulo Motta commented on CASSANDRA-10485: ----------------------------------------- bq. HashMultiMap uses good old HashMap internally. So it's not safe to update it and read at the same time. I think they meant that if the map is immutable it is safe to have multiple concurrent readers. Thanks for clarifying this. Updated patch to perform removal of pending endpoint by working on a copy and then replacing the reference atomically. bq. I think you either need to make the entire thing atomic and make TMD COW and propagate the TMD the entire way. Or alternatively optimistic and if the data isn't there just abort dropping the hint instead of asserting that it is an error. You could assert instead that pendingEndpointsFor() doesn't include the missing endpoint. Implemented the optimistic approach of discarding hints to endpoints with null ID and log a debug message (this should happen seldom with immediate removal of the endpoint from pending ranges). Also moved assertion to {{TokenMetadata.getHostId()}}, so we guarantee that a host id can only be null if the endpoint is not part of the ring (or is not bootstrapping). I needed to do an slight adaptation to the patch on 3.0 on {{StorageProxy.submitHint}} due to the new hints implementation. Tests with new implementation available below: ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-testall/lastCompletedBuild/testReport/]| |[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-dtest/lastCompletedBuild/testReport/]| > Missing host ID on hinted handoff write > --------------------------------------- > > Key: CASSANDRA-10485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10485 > Project: Cassandra > Issue Type: Bug > Reporter: Paulo Motta > Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > when I restart one of them I receive the error "Missing host ID": > {noformat} > WARN [SharedPool-Worker-1] 2015-10-08 13:15:33,882 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.AssertionError: Missing host ID for 63.251.156.141 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.3.jar:2.1.3] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {noformat} > If I made nodetool status, the problematic node has ID: > {noformat} > UN 10.10.10.12 1.3 TB 1 ? > 4d5c8fd2-a909-4f09-a23c-4cd6040f338a rack3 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)