[ https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Jirsa updated CASSANDRA-13282: ----------------------------------- Description: Following CASSANDRA-9749 , stricter correctness checks on commitlog replay can incorrectly detect "corrupt segments" and stop commitlog replay (and potentially stop cassandra, depending on the configured policy). In {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the segment was created), we continue on to the next segment. However, it appears that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining in the segment, we'll pass the {{isEOF}} on the while loop but fail to read the {{serializedSize}} int, and fail. (was: Following CASSANDRA-9749 , stricter correctness checks on commitlog replay can incorrectly detect "corrupt segments" and stop commitlog replay (and potentially stop cassandra, depending on the configured policy). In {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the segment was created), we continue on to the next segment. However, it appears that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining in the segment, we'll hit pass the {{isEOF}} on the while loop but fail to read the {{serializedSize}} int, and fail. ) > Commitlog replay may fail if last mutation is within 4 bytes of end of segment > ------------------------------------------------------------------------------ > > Key: CASSANDRA-13282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13282 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jeff Jirsa > Assignee: Jeff Jirsa > Fix For: 3.0.x, 3.11.x, 4.x > > > Following CASSANDRA-9749 , stricter correctness checks on commitlog replay > can incorrectly detect "corrupt segments" and stop commitlog replay (and > potentially stop cassandra, depending on the configured policy). In > {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int > {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the > segment was created), we continue on to the next segment. However, it appears > that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining > in the segment, we'll pass the {{isEOF}} on the while loop but fail to read > the {{serializedSize}} int, and fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)