[ https://issues.apache.org/jira/browse/CASSANDRA-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450649#comment-13450649 ]
Brandon Williams commented on CASSANDRA-4559: --------------------------------------------- I'm pretty sure any gossip-related issues were fixed with the last patch on CASSANDRA-4383. But now there are other problems with missing data occasionally. It's not entirely easy to replicate since it seems to depend on what shuffle decides to do which is random, but here is the simplest example I've distilled from the logs with a small amount of vnodes. Given nodes A through C with the following tokens (which were evenly split single tokens, but now are contiguous two-token ranges after upgrading): A - 0,5 B - 1,2 C - 3,4 shuffle attempts to move them to: A - 1,3 B - 0,5 C - 2,4 The first step begins as follows: A begins taking 1 from B B begins taking 5 from A C begins taking 2 from B At this point, one interesting is that pendingRanges on all nodes look like this: A: (4,1] B: (5,5] C: (2,2] B and C both have the entire ring pending, which may or may not be significant. Now A correctly requests (0,1] from B, B correctly requests (4,5] from A, and C correctly requests (1,2] from B. Next, A takes (2,3] from C, B takes (5,0] from A, but C already owns (3,4] and does nothing. At this point shuffle is complete, but there is at least one key missing from each node, which is odd because it means C's first and only step was faulty. Other shuffle combinations with these same tokens/layout can succeed, so I'm not entirely sure what makes the failures special, but I'll keep looking into it. > implement token relocation > -------------------------- > > Key: CASSANDRA-4559 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4559 > Project: Cassandra > Issue Type: Sub-task > Components: Core, Tools > Affects Versions: 1.2.0 beta 1 > Reporter: Eric Evans > Assignee: Eric Evans > Labels: vnodes > Fix For: 1.2.0 beta 1 > > > Whatever the specifics of a _shuffle_ (see CASSANDRA-4443), it will be > necessary to relocate a range from one node to another. > _Edit0: Linked in new patch containing tests._ > ---- > h3. Patches > ||Compare||Raw diff||Description|| > |[010_refactor_range_move|https://github.com/acunu/cassandra/compare/top-bases/p/4443/010_refactor_range_move...p/4443/010_refactor_range_move]|[010_refactor_range_move.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/010_refactor_range_move...p/4443/010_refactor_range_move.diff]|No > Description| > |[020_calculate_pending|https://github.com/acunu/cassandra/compare/top-bases/p/4443/020_calculate_pending...p/4443/020_calculate_pending]|[020_calculate_pending.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/020_calculate_pending...p/4443/020_calculate_pending.diff]|No > Description| > |[030_relocate_token|https://github.com/acunu/cassandra/compare/top-bases/p/4443/030_relocate_token...p/4443/030_relocate_token]|[030_relocate_token.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/030_relocate_token...p/4443/030_relocate_token.diff]|No > Description| > |[040_tests|https://github.com/acunu/cassandra/compare/top-bases/p/4443/040_tests...p/4443/040_tests]|[040_tests.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/040_tests...p/4443/040_tests.diff]|No > Description| > ---- > _Note: These are branches managed with TopGit. If you are applying the patch > output manually, you will either need to filter the TopGit metadata files > (i.e. {{wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1}}), > or remove them afterward ({{rm .topmsg .topdeps}})._ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira