[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

Yuki Morishita (JIRA) Wed, 11 May 2016 06:11:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280084#comment-15280084
 ]


Yuki Morishita commented on CASSANDRA-11750:
--------------------------------------------

I think on trunk this is not the case anymore after CASSANDRA-11578 which 
decoupled disk failure policy handling and only applied it when online.
But still there can be the case where CorruptSSTableException is handled in 
offline scrub that prevent it to continue.
Let me check.

> Offline scrub should not abort when it hits corruption
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11750
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Adam Hattrell
>            Priority: Minor
>              Labels: Tools
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

Reply via email to