[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988450#comment-14988450
 ] 

Paulo Motta commented on CASSANDRA-10485:
-----------------------------------------

bq. HashMultiMap uses good old HashMap internally. So it's not safe to update 
it and read at the same time. I think they meant that if the map is immutable 
it is safe to have multiple concurrent readers.

Thanks for clarifying this. Updated patch to perform removal of pending 
endpoint by working on a copy and then replacing the reference atomically.

bq. I think you either need to make the entire thing atomic and make TMD COW 
and propagate the TMD the entire way. Or alternatively optimistic and if the 
data isn't there just abort dropping the hint instead of asserting that it is 
an error. You could assert instead that pendingEndpointsFor() doesn't include 
the missing endpoint.

Implemented the optimistic approach of discarding hints to endpoints with null 
ID and log a debug message (this should happen seldom with immediate removal of 
the endpoint from pending ranges). Also moved assertion to 
{{TokenMetadata.getHostId()}}, so we guarantee that a host id can only be null 
if the endpoint is not part of the ring (or is not bootstrapping).

I needed to do an slight adaptation to the patch on 3.0 on 
{{StorageProxy.submitHint}} due to the new hints implementation. Tests with new 
implementation available below:

||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-dtest/lastCompletedBuild/testReport/]|

> Missing host ID on hinted handoff write
> ---------------------------------------
>
>                 Key: CASSANDRA-10485
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>             Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
>         at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
>         at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
>         at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
>         at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB     1       ?       
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to