[ 
https://issues.apache.org/jira/browse/CASSANDRA-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15521804#comment-15521804
 ] 

Stefania commented on CASSANDRA-12582:
--------------------------------------

Thanks for the explanation [~slebresne], I created CASSANDRA-12705 for adding a 
field to SystemKeyspace.DroppedColumns. I totally missed that the schema 
mutations are pushed to other nodes and would therefore cause exceptions during 
a rolling upgrade.

I've checked the CI results and they are all clean except for testall on trunk, 
but the failures there also occur on trunk. Do you think you could be the 
reviewer since you've already looked at the change anyway?

I also converted the reproduction script into a dtest, pull request 
[here|https://github.com/riptano/cassandra-dtest/pull/1342].

--

[~eprothro]: it's great to hear that the patch fixes your real-world use case 
as well, thank you so much for this update and, once again, for the 
reproduction script.

> Removing static column results in ReadFailure due to CorruptSSTableException
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12582
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12582
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: Cassandra 3.0.8
>            Reporter: Evan Prothro
>            Assignee: Stefania
>            Priority: Critical
>              Labels: compaction, corruption, drop, read, static
>             Fix For: 3.0.x, 3.x
>
>         Attachments: 12582.cdl, 12582_reproduce.sh
>
>
> We ran into an issue on production where reads began to fail for certain 
> queries, depending on the range within the relation for those queries. 
> Cassandra system log showed an unhandled {{CorruptSSTableException}} 
> exception.
> CQL read failure:
> {code}
> ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
> failed - received 0 responses and 1 failures" info={'failures': 1, 
> 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra exception:
> {code}
> WARN  [SharedPool-Worker-2] 2016-08-31 12:49:27,979 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.8.jar:3.0.8]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:343)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:66)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:62)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:24)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   ... 5 common frames omitted
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:130)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:46)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:69)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:338)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   ... 19 common frames omitted
> Caused by: java.io.IOException: Corrupt (negative) value length encountered
>   at 
> org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:399) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.BufferCell$Serializer.deserialize(BufferCell.java:302)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(UnfilteredSerializer.java:462)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:440)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeStaticRow(UnfilteredSerializer.java:381)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:179)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:103)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   ... 22 common frames omitted
> {code}
> After debugging, it appears that a previously dropped static column (weeks 
> prior) was the instigator of the issue. As a workaround we added back the 
> column, restarted all cassandra processes within the cluster, and the read 
> error and corruption exception went away.
> Attached is a script to reproduce with a simple schema.
> Also noteworthy (and shown in the script) is that when in this state, 
> compaction silently failed (exit 0) to remove the dropped static columns from 
> the "corrupted" sstable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to