[jira] [Commented] (CASSANDRA-5930) Offline scrubs can choke on broken files
[ https://issues.apache.org/jira/browse/CASSANDRA-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862046#comment-13862046 ] Jonathan Ellis commented on CASSANDRA-5930: --- It would be nice to be able to tell people how to fix it (realistically: what their options are) rather than just sorry, scrub can't help you. But I'm not sure what those options are. :) /cc [~slebresne] [~iamaleksey] Offline scrubs can choke on broken files Key: CASSANDRA-5930 URL: https://issues.apache.org/jira/browse/CASSANDRA-5930 Project: Cassandra Issue Type: Bug Reporter: Jeremiah Jordan Assignee: Tyler Hobbs Priority: Minor Attachments: 5930-v1.patch There are cases where offline scrub can hit an exception and die, like: {noformat} WARNING: Non-fatal error reading row (stacktrace follows) Exception in thread main java.io.IOError: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:242) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:121) Caused by: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:182) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) ... 1 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 5 more {noformat} Since the purpose of offline scrub is to fix broken stuff, it should be more resilient to broken stuff... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5930) Offline scrubs can choke on broken files
[ https://issues.apache.org/jira/browse/CASSANDRA-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862052#comment-13862052 ] Aleksey Yeschenko commented on CASSANDRA-5930: -- [~jbellis] There are no options. That said, we should probably allow users to override this behavior, if they prefer losing some of the counters history to not scrubbing at all. Also, with CASSANDRA-6504 in this becomes a non-issue (for the newly written 'global' 2.1 shards, at least - we *can* repair those after the scrub). Offline scrubs can choke on broken files Key: CASSANDRA-5930 URL: https://issues.apache.org/jira/browse/CASSANDRA-5930 Project: Cassandra Issue Type: Bug Reporter: Jeremiah Jordan Assignee: Tyler Hobbs Priority: Minor Attachments: 5930-v1.patch There are cases where offline scrub can hit an exception and die, like: {noformat} WARNING: Non-fatal error reading row (stacktrace follows) Exception in thread main java.io.IOError: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:242) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:121) Caused by: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:182) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) ... 1 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 5 more {noformat} Since the purpose of offline scrub is to fix broken stuff, it should be more resilient to broken stuff... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5930) Offline scrubs can choke on broken files
[ https://issues.apache.org/jira/browse/CASSANDRA-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860591#comment-13860591 ] Tyler Hobbs commented on CASSANDRA-5930: [~jeffpotter] what version of Cassandra were you running when you hit the above error? As far as the original stacktrace for this ticket goes, it's unfortunately necessary for counter CFs. CASSANDRA-2759 explains the reasoning. I suppose I could make the error message mention that and point to the ticket. The scrub code looks reasonably robust in general, so I think it's better to wait for individual bugs to get reported than to try to improve the code without any failure examples. Offline scrubs can choke on broken files Key: CASSANDRA-5930 URL: https://issues.apache.org/jira/browse/CASSANDRA-5930 Project: Cassandra Issue Type: Bug Reporter: Jeremiah Jordan Assignee: Tyler Hobbs Priority: Minor There are cases where offline scrub can hit an exception and die, like: {noformat} WARNING: Non-fatal error reading row (stacktrace follows) Exception in thread main java.io.IOError: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:242) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:121) Caused by: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:182) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) ... 1 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 5 more {noformat} Since the purpose of offline scrub is to fix broken stuff, it should be more resilient to broken stuff... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5930) Offline scrubs can choke on broken files
[ https://issues.apache.org/jira/browse/CASSANDRA-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860912#comment-13860912 ] J Potter commented on CASSANDRA-5930: - Hi Tyler -- based on my notes, it should have been Cassandra 1.2.6.1 (DSE 3.1), at least, that's what other tickets we have filed at this same time suggest. Offline scrubs can choke on broken files Key: CASSANDRA-5930 URL: https://issues.apache.org/jira/browse/CASSANDRA-5930 Project: Cassandra Issue Type: Bug Reporter: Jeremiah Jordan Assignee: Tyler Hobbs Priority: Minor There are cases where offline scrub can hit an exception and die, like: {noformat} WARNING: Non-fatal error reading row (stacktrace follows) Exception in thread main java.io.IOError: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:242) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:121) Caused by: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:182) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) ... 1 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 5 more {noformat} Since the purpose of offline scrub is to fix broken stuff, it should be more resilient to broken stuff... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5930) Offline scrubs can choke on broken files
[ https://issues.apache.org/jira/browse/CASSANDRA-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756170#comment-13756170 ] Jeff Potter commented on CASSANDRA-5930: We're seeing this too -- slightly different stack trace, which I'll include here in case it's of use. WARNING: Non-fatal error reading row (stacktrace follows) Exception in thread main java.io.IOError: java.lang.IllegalArgumentException at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:244) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:125) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:247) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:128) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:114) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:109) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:219) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:114) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:160) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:166) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:173) ... 1 more Offline scrubs can choke on broken files Key: CASSANDRA-5930 URL: https://issues.apache.org/jira/browse/CASSANDRA-5930 Project: Cassandra Issue Type: Bug Reporter: Jeremiah Jordan Assignee: Jason Brown Priority: Minor There are cases where offline scrub can hit an exception and die, like: {noformat} WARNING: Non-fatal error reading row (stacktrace follows) Exception in thread main java.io.IOError: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:242) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:121) Caused by: java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:182) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) ... 1 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 5 more {noformat} Since the purpose of offline scrub is to fix broken stuff, it should be more resilient to broken stuff... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira