[ 
https://issues.apache.org/jira/browse/CASSANDRA-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891414#comment-13891414
 ] 

Ryan McGuire commented on CASSANDRA-6648:
-----------------------------------------

Both 6648-v3.txt and 6648-v3-1.2.txt are +1 from me.

Tested with the following:

{code}
ccm create bootstrap_bug
ccm populate -n 3
ccm start
ccm node1 stress -n 10000

# Bootstrap a new node:
ccm add -b node4 -t 127.0.0.4:9160 -l 127.0.0.4:7000 -j 7400 --binary-itf 
127.0.0.4:9042
ccm node4 start

# Query data from the new node:
ccm node4 cqlsh

cqlsh>  select * from "Keyspace1"."Standard1" limit 10;
{code}

Pre-patch, they both errored out saying "Bad Request: Keyspace Keyspace1 does 
not exist". Now they come back with the data.

> Race condition during node bootstrapping
> ----------------------------------------
>
>                 Key: CASSANDRA-6648
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6648
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Critical
>         Attachments: 6648-v2.txt, 6648-v3-1.2.txt, 6648-v3.txt, 
> CASSANDRA-6648.patch
>
>
> When bootstrapping a new node, data is "missing" as if the new node didn't 
> actually bootstrap, which I tracked down to the following scenario:
> 1) New node joins token ring and waits for schema to be settled before 
> actually bootstrapping.
> 2) The schema scheck somewhat passes and it starts bootstrapping.
> 3) Bootstrapping doesn't find the ks/cf that should have received from the 
> other node.
> 4) Queries at this point cause NPEs, until when later they "recover" but data 
> is missed.
> The problem seems to be caused by a race condition between the migration 
> manager and the bootstrapper, with the former running after the latter.
> I think this is supposed to protect against such scenarios:
> {noformat}
>             while (!MigrationManager.isReadyForBootstrap())
>             {
>                 setMode(Mode.JOINING, "waiting for schema information to 
> complete", true);
>                 Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
>             }
> {noformat}
> But MigrationManager.isReadyForBootstrap() implementation is quite fragile 
> and doesn't take into account "slow" schema propagation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to