[ 
https://issues.apache.org/jira/browse/CASSANDRA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147054#comment-13147054
 ] 

Jonas Borgström commented on CASSANDRA-3466:
--------------------------------------------

> I haven't been able to reproduce the assertion errors, but I did find what is 
> preventing hint delivery in some cases

Brandon, Did you verify that removing those lines of code actually fixes hint 
delivery? 

Instead of changing the code I just did a quick experiment with "nodetool 
flush" on the node holding the hints and then restarting the other node but 
that was not enough to trigger hints delivery:

{code}
Node1 notices that node2 is backup up
  INFO 14:41:50,752 Node /127.0.0.2 has restarted, now UP
  INFO 14:41:50,752 InetAddress /127.0.0.2 is now UP
  INFO 14:41:50,753 Node /127.0.0.2 state jump to normal
But no hints are delivered...

nodetool flush is used to make sure hints hit the disk on node1:

  INFO 14:42:32,675 Enqueuing flush of Memtable-Versions@1503666327(83/103 
serialized/live bytes, 3 ops)
  INFO 14:42:32,675 Writing Memtable-Versions@1503666327(83/103 serialized/live 
bytes, 3 ops)
  INFO 14:42:32,681 Completed flushing 
/tmp/node1/data/data/system/Versions-h-1-Data.db (247 bytes)
  INFO 14:42:32,682 Enqueuing flush of 
Memtable-HintsColumnFamily@737188401(177/221 serialized/live bytes, 1 ops)
  INFO 14:42:32,682 Writing Memtable-HintsColumnFamily@737188401(177/221 
serialized/live bytes, 1 ops)
  INFO 14:42:32,688 Completed flushing 
/tmp/node1/data/data/system/HintsColumnFamily-h-1-Data.db (277 bytes)
  INFO 14:42:32,691 Enqueuing flush of Memtable-bar@1831941861(17/21 
serialized/live bytes, 1 ops)
  INFO 14:42:32,691 Writing Memtable-bar@1831941861(17/21 serialized/live 
bytes, 1 ops)
  INFO 14:42:32,694 Completed flushing /tmp/node1/data/data/foo/bar-h-1-Data.db 
(68 bytes)

Node2 is restarted once more to check if this will trigger hints delivery:
  INFO 14:42:54,650 InetAddress /127.0.0.2 is now dead.
  INFO 14:43:02,628 Node /127.0.0.2 has restarted, now UP
  INFO 14:43:02,629 InetAddress /127.0.0.2 is now UP
  INFO 14:43:02,629 Node /127.0.0.2 state jump to normal

Still nothing...  Restarting node 1 will deliver the hints within a few seconds 
though...
{code}

Regarding reproducing the assertion error it's a bit tricky. But after letting 
my two node test cluster performing hints delivery for each other a few times I 
was able to reproduce it once more. Is there anything special you would like me 
to test?



                
> Hinted handoff not working after rolling upgrade from 0.8.7 to 1.0.2
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-3466
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3466
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Jonas Borgström
>            Assignee: Brandon Williams
>              Labels: hintedhandoff
>             Fix For: 1.0.3
>
>
> While testing rolling upgrades from 0.8.7 to 1.0.2 on a test cluster I've 
> noticed that hinted hand-off didn't always work properly. Hints generated on 
> an upgraded node does not seem to be delivered to other newly upgraded nodes 
> once they rejoin the ring. They only way I've found to get a node to deliver 
> its hints is to restart it.
> Here's some steps to reproduce this issue:
> 1. Install cassandra 0.8.7 on node1 and node2 using default settings.
> 2. Create keyspace foo with {replication_factor: 2}. Create column family bar
> 3. Shutdown node2 
> 4. Insert data into bar and verify that HintsColumnFamily on node2 contains 
> hints
> 5. Start node2 and verify that hinted handoff is performed and 
> HintsColumnFamily becomes empty again.
> 6. Upgrade and restart node1
> 7. Shutdown node2 
> 8. Insert data into bar and verify that HintsColumnFamily on node2 contains 
> hints
> 9. Upgrade and start node2
> 10. Notice that hinted handoff is *not* performed when "node2" comes back. 
> (Only if node1 is restarted)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to