[ 
https://issues.apache.org/jira/browse/CASSANDRA-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-15778:
------------------------------------
    Test and Documentation Plan: Test added; manually tested additional 
scenarios of upgrade to 3.11, dropping compact storage, and reading in 4.0. 
However, this is going to be fully handled in scope of [CASSANDRA-15811].
                         Status: Patch Available  (was: Open)

Posting the patch:

|[circleci|https://circleci.com/gh/ifesdjeen/cassandra/tree/15778-3.0]|[3.0 
patch|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:15778-3.0?expand=1]|

In summary, problem was caused by the fact that CQL-created {{COMPACT STORAGE}} 
dense tables (i.e., ones that have a clustering key and have no regular 
columns), had a hidden {{EmptyType}} column (compact value column). This 
compact value column was accessible through thrift, and it was possible to 
write key-value pairs to it, which would look like:

{code}
[
{"key": "key22",
 "cells": [["key1","76616c756531",1589306163593], # <-- thrift-written cell
           ["key4","76616c756534",1589306163594]]}
], [
{"key": "key000",
 "cells": [["value000","",1589371872206298]]}, # <-- cql-written cell, for 
comparison
]
{code}

When upgrading to 3.0, we were running into one of the following situations. 
During sstable upgrade, value would get written to 3.0 sstable using 
{{AbstractType#write}}, which would write any fixed-size (0 size is fixed for 
empty type), taking {{ByteBuffer#remaining}} bytes from its value. Since the 
value was written through thrift, it is non-empty. Similar situation would 
happen when issuing a local read, which would go through local response. When 
trying to read upgraded sstable, we read all the way to the beginning of empty 
value, then try to read the value itself, but we know that its size is zero, so 
we read nothing. Whichever code path attempts to read next bytes, will try to 
interpret our thrift-written bytes as the rest of cells, which leads to 
corruption.

I was a bit reluctant to effectively turn {{EmptyType}} into {{BytesType}}, but 
using empty compact value from thrift in a way that makes it look like it's 
just a byte array was already possible (even though I believe only a few people 
have used it this way), and now we have to make this behaviour official.

3.11 patch will look more or less the same as 3.0 one. Patch is not applicable 
to 4.0, but we will have to make sure we make {{EmptyType}} is migrated to 
{{BytesType}} when dropping compact storage in scope of [CASSANDRA-15811].

> CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15778
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15778
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Sumanth Pasupuleti
>            Assignee: Alex Petrov
>            Priority: Normal
>             Fix For: 3.0.x
>
>
> Below is the exception with stack trace. This issue is consistently 
> reproduce-able.
> {code:java}
> ERROR [SharedPool-Worker-1] 2020-05-01 14:57:57,661 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]ERROR [SharedPool-Worker-1] 2020-05-01 
> 14:57:57,661 AbstractLocalAwareExecutorService.java:169 - Uncaught exception 
> on thread 
> Thread[SharedPool-Worker-1,5,main]org.apache.cassandra.io.sstable.CorruptSSTableException:
>  Corrupted: 
> /mnt/data/cassandra/data/<ks>/<cf-fda511301fb311e7bd79fd24f2fcfb0d/md-10151-big-Data.db
>  at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:349)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:220)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.SSTableIterator.hasNext(SSTableIterator.java:33)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:131)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:294)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:341) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_231] at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
>  [nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [nf-cassandra-3.0.19.8.jar:3.0.19.8] at java.lang.Thread.run(Thread.java:748) 
> [na:1.8.0_231]Caused by: java.lang.ArrayIndexOutOfBoundsException: 121 at 
> org.apache.cassandra.db.ClusteringPrefix$Deserializer.prepare(ClusteringPrefix.java:425)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.prepareNext(UnfilteredDeserializer.java:170)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.hasNext(UnfilteredDeserializer.java:151)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.computeNext(SSTableIterator.java:140)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.hasNextInternal(SSTableIterator.java:172)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:336)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] ... 27 common frames omitted
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 121
>     at 
> org.apache.cassandra.db.ClusteringPrefix$Deserializer.prepare(ClusteringPrefix.java:425)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8]
>     at 
> org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.prepareNext(UnfilteredDeserializer.java:170)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8]
>     at 
> org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.hasNext(UnfilteredDeserializer.java:151)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8]
>     at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.computeNext(SSTableIterator.java:140)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8]
>     at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.hasNextInternal(SSTableIterator.java:172)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8]
>     at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:336)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] ... 27 common frames omitted
> {code}
> Column family definition
> {code:java}
> CREATE TABLE <keyspace>."<cf>" (
>  key text,
>  value text,
>  PRIMARY KEY (key, value)
>  ) WITH COMPACT STORAGE
>  AND CLUSTERING ORDER BY (value ASC)
>  AND bloom_filter_fp_chance = 0.01
>  AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>  AND comment = ''
>  AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
>  AND compression = {'enabled': 'false'}
>  AND crc_check_chance = 1.0
>  AND dclocal_read_repair_chance = 0.1
>  AND default_time_to_live = 0
>  AND gc_grace_seconds = 864000
>  AND max_index_interval = 2048
>  AND memtable_flush_period_in_ms = 0
>  AND min_index_interval = 128
>  AND read_repair_chance = 0.0
>  AND speculative_retry = '99PERCENTILE';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to