[ https://issues.apache.org/jira/browse/CASSANDRA-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051822#comment-15051822 ]
Russ Hatch edited comment on CASSANDRA-10822 at 12/10/15 11:40 PM: ------------------------------------------------------------------- [~blambov] proto dtest here: https://github.com/riptano/cassandra-dtest/compare/test_10822?expand=1 To run this locally, set CASSANDRA_DIR in your env and build cassandra in that location, then run the dtest with something like: nosetests -xvs upgrade_tests/other_test.py was (Author: rhatch): [~blambov] proto dtest here: https://github.com/riptano/cassandra-dtest/compare/test_10822?expand=1 > SSTable data loss when upgrading with row tombstone present > ----------------------------------------------------------- > > Key: CASSANDRA-10822 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10822 > Project: Cassandra > Issue Type: Bug > Reporter: Andy Tolbert > Assignee: Branimir Lambov > Priority: Critical > Fix For: 3.0.x, 3.x > > > I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also > cassandra-3.0 branch) where subsequent rows were lost within a partition > where there is a row tombstone present. > Here's a scenario that reproduces the issue. > Using ccm create a single node cluster at 2.1.11: > {{ccm create -n 1 -v 2.1.11 -s financial}} > Run the following queries to create schema, populate some data and then > delete some data for november: > {noformat} > drop keyspace if exists financial; > create keyspace if not exists financial with replication = {'class': > 'SimpleStrategy', 'replication_factor' : 1 }; > create table if not exists financial.symbol_history ( > symbol text, > name text static, > year int, > month int, > day int, > volume bigint, > close double, > open double, > low double, > high double, > primary key((symbol, year), month, day) > ) with CLUSTERING ORDER BY (month desc, day desc); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 1, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 2, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 3, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 4, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 5, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 6, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 7, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 8, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 9, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 10, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 11, 1, 100); > insert into financial.symbol_history (symbol, name, year, month, day, volume) > values ('CORP', 'MegaCorp', 2004, 12, 1, 100); > delete from financial.symbol_history where symbol='CORP' and year = 2004 and > month=11; > {noformat} > Flush and run sstable2json on the sole Data.db file: > {noformat} > ccm node1 flush > sstable2json /path/to/file.db > {noformat} > The output should look like the following: > {code} > [ > {"key": "CORP:2004", > "cells": [["::name","MegaCorp",1449457517033030], > ["12:1:","",1449457517033030], > ["12:1:volume","100",1449457517033030], > ["11:_","11:!",1449457564983269,"t",1449457564], > ["10:1:","",1449457516313738], > ["10:1:volume","100",1449457516313738], > ["9:1:","",1449457516310205], > ["9:1:volume","100",1449457516310205], > ["8:1:","",1449457516235664], > ["8:1:volume","100",1449457516235664], > ["7:1:","",1449457516233535], > ["7:1:volume","100",1449457516233535], > ["6:1:","",1449457516231458], > ["6:1:volume","100",1449457516231458], > ["5:1:","",1449457516228307], > ["5:1:volume","100",1449457516228307], > ["4:1:","",1449457516225415], > ["4:1:volume","100",1449457516225415], > ["3:1:","",1449457516222811], > ["3:1:volume","100",1449457516222811], > ["2:1:","",1449457516220301], > ["2:1:volume","100",1449457516220301], > ["1:1:","",1449457516210758], > ["1:1:volume","100",1449457516210758]]} > ] > {code} > Prepare for upgrade > {noformat} > ccm node1 nodetool snapshot financial > ccm node1 nodetool drain > ccm node1 stop > {noformat} > Upgrade to cassandra-3.0 and start the node > {noformat} > ccm node1 setdir -v git:cassandra-3.0 > ccm node1 start > {noformat} > Run command in cqlsh and observe only 1 row is returned! It appears that all > data following november is gone. > {noformat} > cqlsh> select * from financial.symbol_history; > symbol | year | month | day | name | close | high | low | open | volume > --------+------+-------+-----+----------+-------+------+------+------+-------- > CORP | 2004 | 12 | 1 | MegaCorp | null | null | null | null | 100 > {noformat} > Upgrade sstables and query again and you'll observe the same problem. > {noformat} > ccm node1 nodetool upgradesstables financial > {noformat} > I modified the 2.2 version of sstable2json so that it works with 3.0 > (couldn't help myself :)), and observed 2 RangeTombstoneBoundMarker > occurrences for 1 delete and the rest of the data missing. > {code} > [ > { > "key": "CORP:2004", > "static": { > "cells": { > ["name","MegaCorp",1449457517033030] > } > }, > "rows": [ > { > "clustering": {"month": "12", "day": "1"}, > "cells": { > ["volume","100",1449457517033030] > } > }, > { > "tombstone": ["11:*",1449457564983269,"t",1449457564] > }, > { > "tombstone": ["11:*",1449457564983269,"t",1449457564] > } > ] > } > ] > {code} > I'm not sure why this is happening, but I should point out that I'm using > static columns here and that I'm using reverse order for my clustering, so > maybe that makes a difference. I'll try without static columns / regular > ordering to see if that makes a difference and update the ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)