[ 
https://issues.apache.org/jira/browse/CASSANDRA-14672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617522#comment-16617522
 ] 

Jeff Jirsa commented on CASSANDRA-14672:
----------------------------------------

In chatting with the folks who wrote CASSANDRA-14515 offline (namely 
[~iamaleksey] and [~benedict] ), it sounds like what you're seeing is likely 
corruption that CASSANDRA-14515 was meant to protect. That is: the bug in 
cassandra 3.11.0 to 3.11.2 that causes data loss (14515) is also leaving your 
data files in an invalid corrupt state. The exception is letting you know it's 
broken, and in this case, that you've probably lost data due to that bug.

[~iamaleksey] / [~benedict] - any thoughts on how to prove this is really just 
14515 corruption? Any ideas on recovery (will scrub help here)? 

> After deleting data in 3.11.3, reads fail with "open marker and close marker 
> have different deletion times"
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14672
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14672
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: CentOS 7, GCE, 9 nodes, 4TB disk/~2TB full each, level 
> compaction, timeseries data
>            Reporter: Spiros Ioannou
>            Priority: Blocker
>
> We had 3.11.0, then we upgraded to 3.11.3 last week. We routinely perform 
> deletions as the one described below. After upgrading we run the following 
> deletion query:
>  
> {code:java}
> DELETE FROM measurement_events_dbl WHERE measurement_source_id IN ( 
> 9df798a2-6337-11e8-b52b-42010afa015a,  9df7717e-6337-11e8-b52b-42010afa015a, 
> a08b8042-6337-11e8-b52b-42010afa015a, a08e52cc-6337-11e8-b52b-42010afa015a, 
> a08e6654-6337-11e8-b52b-42010afa015a, a08e6104-6337-11e8-b52b-42010afa015a, 
> a08e6c76-6337-11e8-b52b-42010afa015a, a08e5a9c-6337-11e8-b52b-42010afa015a, 
> a08bcc50-6337-11e8-b52b-42010afa015a) AND year IN (2018) AND measurement_time 
> >= '2018-07-19 04:00:00'{code}
>  
> Immediately after that, trying to read the last value produces an error:
> {code:java}
> select * FROM measurement_events_dbl WHERE measurement_source_id = 
> a08b8042-6337-11e8-b52b-42010afa015a AND year IN (2018) order by 
> measurement_time desc limit 1;
> ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] 
> message="Operation failed - received 0 responses and 2 failures" 
> info={'failures': 2, 'received_responses': 0, 'required_responses': 1, 
> 'consistency': 'ONE'}{code}
>  
> And the following exception: 
> {noformat}
> WARN [ReadStage-4] 2018-08-29 06:59:53,505 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-4,5,main]: {}
> java.lang.RuntimeException: java.lang.IllegalStateException: 
> UnfilteredRowIterator for pvpms_mevents.measurement_events_dbl has an illegal 
> RT bounds sequence: open marker and close marker have different deletion times
>  at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2601)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_181]
>  at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [apache-cassandra-3.11.3.jar:3.11.3]
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.11.3.jar:3.11.3]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
> Caused by: java.lang.IllegalStateException: UnfilteredRowIterator for 
> pvpms_mevents.measurement_events_dbl has an illegal RT bounds sequence: open 
> marker and close marker have different deletion times
>  at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:103)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.applyToMarker(RTBoundValidator.java:81)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:148) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:136)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:92)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:352) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1889)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2597)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 5 common frames omitted
>  Suppressed: java.lang.IllegalStateException: UnfilteredRowIterator for 
> pvpms_mevents.measurement_events_dbl has an illegal RT bounds sequence: 
> expected all RTs to be closed, but the last one is open
>  at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:103)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.onPartitionClose(RTBoundValidator.java:96)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at org.apache.cassandra.db.transform.BaseRows.runOnClose(BaseRows.java:91) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:86) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:309)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 12 common frames omitted
>  
> {noformat}
>  
> We have 9 nodes ~2TB each, leveled compaction, repairs run daily in sequence.
> Table definition is:
> {noformat}
> CREATE TABLE pvpms_mevents.measurement_events_dbl (
>  measurement_source_id uuid,
>  year int,
>  measurement_time timestamp,
>  event_reception_time timestamp,
>  quality double,
>  value double,
>  PRIMARY KEY ((measurement_source_id, year), measurement_time)
> ) WITH CLUSTERING ORDER BY (measurement_time ASC)
>  AND bloom_filter_fp_chance = 0.1
>  AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>  AND comment = ''
>  AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>  AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>  AND crc_check_chance = 1.0
>  AND dclocal_read_repair_chance = 0.1
>  AND default_time_to_live = 0
>  AND gc_grace_seconds = 864000
>  AND max_index_interval = 2048
>  AND memtable_flush_period_in_ms = 0
>  AND min_index_interval = 128
>  AND read_repair_chance = 0.0
>  AND speculative_retry = '99PERCENTILE';{noformat}
>  
> We host those on GCE and recreated all the nodes with disk snapshots, and we 
> reproduced the error: after re-running the DELETE with all nodes up and no 
> other queries running, the error was reproduced immediately.
>  
> We tried so far:
> re-running repairs on all nodes and running nodetool garbagecollect with no 
> success.
> We downgraded to 3.11.2 for now, no issues so far.
> This may be related to CASSANDRA-14515



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to