[jira] [Commented] (CASSANDRA-9140) Scrub should handle corrupted compressed chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521034#comment-14521034 ] Stefania commented on CASSANDRA-9140: - Thanks for your feedback. bq. Unless I'm missing something, it looks like the retry doesn't actually use the data size or position from the index. It seems like the intent is to try to read the data based on the Data component's position and size (if present) first, and if that fails, use the position and size from the index. I think you are absolutely correct, I've added a seek to the position read from the index. I've addressed all other comments as well, in all 3 patches, which have also been rebased. Finally, I added a couple more utests to increase code coverage but only without compression as I would not know how to achieve the same with compression. Scrub should handle corrupted compressed chunks --- Key: CASSANDRA-9140 URL: https://issues.apache.org/jira/browse/CASSANDRA-9140 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Stefania Fix For: 2.1.x, 2.0.x Scrub can handle corruption within a row, but can't handle corruption of a compressed sstable that results in being unable to decompress a chunk. Since the majority of Cassandra users are probably running with compression enabled, it's important that scrub be able to handle this (likely more common) form of sstable corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9140) Scrub should handle corrupted compressed chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520186#comment-14520186 ] Tyler Hobbs commented on CASSANDRA-9140: The changes you made look pretty good to me, overall. However, some of the surrounding code in Scrubber seems odd or incorrect. I think we should try to improve it a bit while we're working around this code: * We should use separate log messages for an unreadable key and differing index/data keys. If the keys differ, we should log both keys. * The warning log for {{dataStart != dataStartFromIndex}} should be moved before the data read, and we should log both start positions ** A similar check and log message for {{dataSize != dataSizeFromIndex}} would be good * Unless I'm missing something, it looks like the retry doesn't actually use the data size or position from the index. It seems like the intent is to try to read the data based on the Data component's position and size (if present) first, and if that fails, use the position and size from the index. * If {{currentIndexKey}} is null (meaning there was an error reading from the index), we should just make {{dataStartFromIndex}} and {{dataSizeFromIndex}} -1 to avoid confusing numbers in the log messages * {{dataStartFromIndex}} should be using the previous value of {{nextRowPositionFromIndex}} (or 0 if it's the first row). Right now it's using a combination of index positions and the data file positions. Let me know what you think about the above suggestions. Scrub should handle corrupted compressed chunks --- Key: CASSANDRA-9140 URL: https://issues.apache.org/jira/browse/CASSANDRA-9140 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Stefania Fix For: 2.1.x, 2.0.x Scrub can handle corruption within a row, but can't handle corruption of a compressed sstable that results in being unable to decompress a chunk. Since the majority of Cassandra users are probably running with compression enabled, it's important that scrub be able to handle this (likely more common) form of sstable corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9140) Scrub should handle corrupted compressed chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513670#comment-14513670 ] Stefania commented on CASSANDRA-9140: - utest and dtest status available here: http://cassci.datastax.com/view/Dev/view/Stefania/ Scrub should handle corrupted compressed chunks --- Key: CASSANDRA-9140 URL: https://issues.apache.org/jira/browse/CASSANDRA-9140 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Stefania Fix For: 2.0.15, 2.1.5 Scrub can handle corruption within a row, but can't handle corruption of a compressed sstable that results in being unable to decompress a chunk. Since the majority of Cassandra users are probably running with compression enabled, it's important that scrub be able to handle this (likely more common) form of sstable corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9140) Scrub should handle corrupted compressed chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505831#comment-14505831 ] Tyler Hobbs commented on CASSANDRA-9140: I should be able to review this within the next few days. Scrub should handle corrupted compressed chunks --- Key: CASSANDRA-9140 URL: https://issues.apache.org/jira/browse/CASSANDRA-9140 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Stefania Fix For: 2.0.15, 2.1.5 Scrub can handle corruption within a row, but can't handle corruption of a compressed sstable that results in being unable to decompress a chunk. Since the majority of Cassandra users are probably running with compression enabled, it's important that scrub be able to handle this (likely more common) form of sstable corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9140) Scrub should handle corrupted compressed chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502604#comment-14502604 ] Stefania commented on CASSANDRA-9140: - I prepared a patch for trunk and one for 2.0 since there is an issue in {{CompressedRandomAccessReader}} that is only on trunk and since the test compression parameters loading is different on trunk. Then I noticed that the merge from 2.0 to 2.1 was not straightforward either and so I saved the 2.1 patch as well, all attached as links. The main change, other than {{testScrubCorruptedCounterRow()}} itself, is to make the scrub algorithm keep on seeking to partition positions read from the index rather than giving up at the second attempt, since when a compression chunk is corrupted many partitions will be lost, not just one. Also, I don't think it's right to assume that the key read from the data file is correct when it is different from the key read from the index, because in case of corrupted data we would most likely read junk. In fact the existing test, {{testScrubCorruptedCounterRow()}}, was passing just because we would try to read beyond the file end and therefore end up with a null key due to the EOF exception. When I increased the number of partitions in the test (without compression), it started to report one empty row and one bad row, rather than just one bad row. [~thobbs], let me know if you can take this review or if we need to find someone else. Scrub should handle corrupted compressed chunks --- Key: CASSANDRA-9140 URL: https://issues.apache.org/jira/browse/CASSANDRA-9140 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Stefania Fix For: 2.0.15, 2.1.5 Scrub can handle corruption within a row, but can't handle corruption of a compressed sstable that results in being unable to decompress a chunk. Since the majority of Cassandra users are probably running with compression enabled, it's important that scrub be able to handle this (likely more common) form of sstable corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9140) Scrub should handle corrupted compressed chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486346#comment-14486346 ] Tyler Hobbs commented on CASSANDRA-9140: Once CASSANDRA-9135 is committed, the {{ScrubTest.testScrubCorruptedCounterRow}} test will be skipped whenever compression is enabled. When this ticket is done (and the test is updated appropriately), we should be able to re-enable the test. Scrub should handle corrupted compressed chunks --- Key: CASSANDRA-9140 URL: https://issues.apache.org/jira/browse/CASSANDRA-9140 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Stefania Fix For: 2.0.15, 2.1.5 Scrub can handle corruption within a row, but can't handle corruption of a compressed sstable that results in being unable to decompress a chunk. Since the majority of Cassandra users are probably running with compression enabled, it's important that scrub be able to handle this (likely more common) form of sstable corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)