[ 
https://issues.apache.org/jira/browse/CASSANDRA-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450649#comment-13450649
 ] 

Brandon Williams commented on CASSANDRA-4559:
---------------------------------------------

I'm pretty sure any gossip-related issues were fixed with the last patch on 
CASSANDRA-4383.  But now there are other problems with missing data 
occasionally.  It's not entirely easy to replicate since it seems to depend on 
what shuffle decides to do which is random, but here is the simplest example 
I've distilled from the logs with a small amount of vnodes.

Given nodes A through C with the following tokens (which were evenly split 
single tokens, but now are contiguous two-token ranges after upgrading):

A - 0,5
B - 1,2
C - 3,4

shuffle attempts to move them to:

A - 1,3
B - 0,5
C - 2,4  

The first step begins as follows:

A begins taking 1 from B
B begins taking 5 from A
C begins taking 2 from B

At this point, one interesting is that pendingRanges on all nodes look like 
this:

A: (4,1]
B: (5,5]
C: (2,2]

B and C both have the entire ring pending, which may or may not be significant.

Now A correctly requests (0,1] from B, B correctly requests (4,5] from A, and C 
correctly requests (1,2] from B.

Next, A takes (2,3] from C, B takes (5,0] from A, but C already owns (3,4] and 
does nothing.  

At this point shuffle is complete, but there is at least one key missing from 
each node, which is odd because it means C's first and only step was faulty.  
Other shuffle combinations with these same tokens/layout can succeed, so I'm 
not entirely sure what makes the failures special, but I'll keep looking into 
it.
                
> implement token relocation
> --------------------------
>
>                 Key: CASSANDRA-4559
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4559
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core, Tools
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Eric Evans
>            Assignee: Eric Evans
>              Labels: vnodes
>             Fix For: 1.2.0 beta 1
>
>
> Whatever the specifics of a _shuffle_ (see CASSANDRA-4443), it will be 
> necessary to relocate a range from one node to another.
> _Edit0: Linked in new patch containing tests._
> ----
> h3. Patches
> ||Compare||Raw diff||Description||
> |[010_refactor_range_move|https://github.com/acunu/cassandra/compare/top-bases/p/4443/010_refactor_range_move...p/4443/010_refactor_range_move]|[010_refactor_range_move.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/010_refactor_range_move...p/4443/010_refactor_range_move.diff]|No
>  Description|
> |[020_calculate_pending|https://github.com/acunu/cassandra/compare/top-bases/p/4443/020_calculate_pending...p/4443/020_calculate_pending]|[020_calculate_pending.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/020_calculate_pending...p/4443/020_calculate_pending.diff]|No
>  Description|
> |[030_relocate_token|https://github.com/acunu/cassandra/compare/top-bases/p/4443/030_relocate_token...p/4443/030_relocate_token]|[030_relocate_token.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/030_relocate_token...p/4443/030_relocate_token.diff]|No
>  Description|
> |[040_tests|https://github.com/acunu/cassandra/compare/top-bases/p/4443/040_tests...p/4443/040_tests]|[040_tests.patch|https://github.com/acunu/cassandra/compare/top-bases/p/4443/040_tests...p/4443/040_tests.diff]|No
>  Description|
> ----
> _Note: These are branches managed with TopGit. If you are applying the patch 
> output manually, you will either need to filter the TopGit metadata files 
> (i.e. {{wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1}}), 
> or remove them afterward ({{rm .topmsg .topdeps}})._

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to