[ 
https://issues.apache.org/jira/browse/CASSANDRA-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated CASSANDRA-18091:
-------------------------------
    Description: 
When we performed a full-stop upgrade from release 4.0.6 to 4.2 trunk (fdc88a), 
we executed the same read command before and after the upgrade and found that 
some column updates were lost after the upgrade.

This inconsistency is likely related to timing and cannot be reproduced 
deterministically. Though it's non-deterministic, this failure *happens 
reliably* *once every ten times* when executing the following command sequence.

Steps to reproduce:

1. Set up a 4.0.6 3-node cluster. (N0: seed node, N1, N2). Default 
configurations, set num_tokens to 256.

2. Execute the following cqlsh command

 
{code:java}
CREATE KEYSPACE  uuid5f86250110a247d48c481c5579cb2ea1 WITH REPLICATION = { 
'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE  uuid5f86250110a247d48c481c5579cb2ea1.kattmG (kattmG TEXT,VldG 
TEXT,Ajqm TEXT,lTQSQ INT,EVzlSKrbkUGyhJshH TEXT, PRIMARY KEY (lTQSQ ));
DELETE FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG WHERE lTQSQ = 2;
ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP kattmG ;
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (Ajqm, 
EVzlSKrbkUGyhJshH, lTQSQ) VALUES 
('nZJzNjYnXOwPLpVoFSVwxcvznsDFBYqmlprrVXYJQLzYvYkrmfEsiuAcCtggypnxIkIevRHyPQGOWrIZNObJ','RaAhbVKUQzgJaupaupKPVnNLLYDaZEaMyFteVwhLePqZwikuBEsVDxTuTqBfkFYmeMMsOFXjVkObZduPfAFsLzuYlrgpYsPPxDNQCRzzPaEdWHARnnWbAFAUUnbYnvEESeHDRHSkEhSnoREprrHWasYLMSocIYiMGQXjzsaKptqbtPgrztIdpQLgDAZOPfhJIblmwTFAWiFbzrbTkFwJGP',1693380861);
DELETE FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG WHERE lTQSQ = 
1693380861;
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (lTQSQ) VALUES (2);
ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP VldG ;
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (lTQSQ) VALUES 
(1693380861);
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (Ajqm, lTQSQ) VALUES 
('ldrsHa',1693380861);
ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP Ajqm ;{code}
3. Execute a `SELECT` command at N0, get result
{code:java}
SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;

 ltqsq
------------
 1693380861
          2
(2 rows){code}
4. Perform a full-stop upgrade. (Set num_tokens to 256)
   1. Drain N0, stop N0.
   2. Drain N1, stop N1.
   3. Drain N2, stop N2.
   4. Start up N0.
   5. Start up N1.
   6. Start up N2.

5. Execute the same `SELECT` command at N0, and get results
{code:java}
SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;


 ltqsq
-------
     2
(1 rows){code}
Or
{code:java}
SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;


 ltqsq
-------

(0 rows){code}
 

The new result is inconsistent with the old version.

Is this an expected behaviour during the full-stop upgrade?

 

  was:
When we performed a full-stop upgrade from release 4.0.6 to 4.2 trunk (fdc88a), 
we executed the same read command before and after the upgrade and found that 
some column updates were lost after the upgrade.

This inconsistency is likely related to timing. We cannot reproduce this case 
{*}deterministically{*}. But if we keep executing the same following command 
sequence, this bug can be triggered every ten times.


Steps to reproduce:

1. Set up a 4.0.6 3-node cluster. (N0: seed node, N1, N2). Default 
configurations, set num_tokens to 256.

2. Execute the following cqlsh command

 
{code:java}
CREATE KEYSPACE  uuid5f86250110a247d48c481c5579cb2ea1 WITH REPLICATION = { 
'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE  uuid5f86250110a247d48c481c5579cb2ea1.kattmG (kattmG TEXT,VldG 
TEXT,Ajqm TEXT,lTQSQ INT,EVzlSKrbkUGyhJshH TEXT, PRIMARY KEY (lTQSQ ));
CREATE KEYSPACE IF NOT EXISTS uuidc5b00efccc77471888e6e8e233998779 WITH 
REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
DELETE FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG WHERE lTQSQ = 2;
ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP kattmG ;
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (Ajqm, 
EVzlSKrbkUGyhJshH, lTQSQ) VALUES 
('nZJzNjYnXOwPLpVoFSVwxcvznsDFBYqmlprrVXYJQLzYvYkrmfEsiuAcCtggypnxIkIevRHyPQGOWrIZNObJ','RaAhbVKUQzgJaupaupKPVnNLLYDaZEaMyFteVwhLePqZwikuBEsVDxTuTqBfkFYmeMMsOFXjVkObZduPfAFsLzuYlrgpYsPPxDNQCRzzPaEdWHARnnWbAFAUUnbYnvEESeHDRHSkEhSnoREprrHWasYLMSocIYiMGQXjzsaKptqbtPgrztIdpQLgDAZOPfhJIblmwTFAWiFbzrbTkFwJGP',1693380861);
DELETE FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG WHERE lTQSQ = 
1693380861;
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (lTQSQ) VALUES (2);
ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP VldG ;
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (lTQSQ) VALUES 
(1693380861);
INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (Ajqm, lTQSQ) VALUES 
('ldrsHa',1693380861);
ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP Ajqm ; {code}

3. Execute a `SELECT` command at N0, get result

 
{code:java}
SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;

 ltqsq
------------
 1693380861
          2
(2 rows){code}
 

 

4. Perform a full-stop upgrade. (Set num_tokens to 256)
   1. Drain N0, stop N0.
   2. Drain N1, stop N1.
   3. Drain N2, stop N2.
   4. Start up N0.
   5. Start up N1.
   6. Start up N2.


5. Execute the same `SELECT` command at N0, get result

 
{code:java}
SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;


 ltqsq
-------
     2
(1 rows){code}
Or

 

 
{code:java}
SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;


 ltqsq
-------

(0 rows){code}
 

The new result is inconsistent with the old version.

 

Is this an expected behaviour during the full-stop upgrade?

 


> Column update lost after a full stop upgrade from 4.0.6 to trunk (4.2:fdc88a)
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18091
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18091
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ke Han
>            Priority: Normal
>
> When we performed a full-stop upgrade from release 4.0.6 to 4.2 trunk 
> (fdc88a), we executed the same read command before and after the upgrade and 
> found that some column updates were lost after the upgrade.
> This inconsistency is likely related to timing and cannot be reproduced 
> deterministically. Though it's non-deterministic, this failure *happens 
> reliably* *once every ten times* when executing the following command 
> sequence.
> Steps to reproduce:
> 1. Set up a 4.0.6 3-node cluster. (N0: seed node, N1, N2). Default 
> configurations, set num_tokens to 256.
> 2. Execute the following cqlsh command
>  
> {code:java}
> CREATE KEYSPACE  uuid5f86250110a247d48c481c5579cb2ea1 WITH REPLICATION = { 
> 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
> CREATE TABLE  uuid5f86250110a247d48c481c5579cb2ea1.kattmG (kattmG TEXT,VldG 
> TEXT,Ajqm TEXT,lTQSQ INT,EVzlSKrbkUGyhJshH TEXT, PRIMARY KEY (lTQSQ ));
> DELETE FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG WHERE lTQSQ = 2;
> ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP kattmG ;
> INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (Ajqm, 
> EVzlSKrbkUGyhJshH, lTQSQ) VALUES 
> ('nZJzNjYnXOwPLpVoFSVwxcvznsDFBYqmlprrVXYJQLzYvYkrmfEsiuAcCtggypnxIkIevRHyPQGOWrIZNObJ','RaAhbVKUQzgJaupaupKPVnNLLYDaZEaMyFteVwhLePqZwikuBEsVDxTuTqBfkFYmeMMsOFXjVkObZduPfAFsLzuYlrgpYsPPxDNQCRzzPaEdWHARnnWbAFAUUnbYnvEESeHDRHSkEhSnoREprrHWasYLMSocIYiMGQXjzsaKptqbtPgrztIdpQLgDAZOPfhJIblmwTFAWiFbzrbTkFwJGP',1693380861);
> DELETE FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG WHERE lTQSQ = 
> 1693380861;
> INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (lTQSQ) VALUES (2);
> ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP VldG ;
> INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (lTQSQ) VALUES 
> (1693380861);
> INSERT INTO uuid5f86250110a247d48c481c5579cb2ea1.kattmG (Ajqm, lTQSQ) VALUES 
> ('ldrsHa',1693380861);
> ALTER TABLE uuid5f86250110a247d48c481c5579cb2ea1.kattmG DROP Ajqm ;{code}
> 3. Execute a `SELECT` command at N0, get result
> {code:java}
> SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;
>  ltqsq
> ------------
>  1693380861
>           2
> (2 rows){code}
> 4. Perform a full-stop upgrade. (Set num_tokens to 256)
>    1. Drain N0, stop N0.
>    2. Drain N1, stop N1.
>    3. Drain N2, stop N2.
>    4. Start up N0.
>    5. Start up N1.
>    6. Start up N2.
> 5. Execute the same `SELECT` command at N0, and get results
> {code:java}
> SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;
>  ltqsq
> -------
>      2
> (1 rows){code}
> Or
> {code:java}
> SELECT lTQSQ FROM uuid5f86250110a247d48c481c5579cb2ea1.kattmG;
>  ltqsq
> -------
> (0 rows){code}
>  
> The new result is inconsistent with the old version.
> Is this an expected behaviour during the full-stop upgrade?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to