Sylvain Lebresne created CASSANDRA-13939:
--------------------------------------------

             Summary: Mishandling of cells for removed/dropped columns when 
reading legacy files
                 Key: CASSANDRA-13939
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13939
             Project: Cassandra
          Issue Type: Bug
          Components: Local Write-Read Paths
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
             Fix For: 3.0.x, 3.11.x


The tl;dr is that there is a bug in reading legacy files that can manifests 
itself with a trace looking like this:
{noformat}
Exception (java.lang.IllegalStateException) encountered during startup: One row 
required, 2 found
java.lang.IllegalStateException: One row required, 2 found
    at 
org.apache.cassandra.cql3.UntypedResultSet$FromResultSet.one(UntypedResultSet.java:84)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.readTableTimestamp(LegacySchemaMigrator.java:254)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.readTable(LegacySchemaMigrator.java:244)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readTables$7(LegacySchemaMigrator.java:238)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator$$Lambda$126/591203139.accept(Unknown
 Source)
    at java.util.ArrayList.forEach(ArrayList.java:1249)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.readTables(LegacySchemaMigrator.java:238)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.readKeyspace(LegacySchemaMigrator.java:187)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readSchema$4(LegacySchemaMigrator.java:178)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator$$Lambda$123/1612073393.accept(Unknown
 Source)
    at java.util.ArrayList.forEach(ArrayList.java:1249)
    at 
org.apache.cassandra.schema.LegacySchemaMigrator.readSchema(LegacySchemaMigrator.java:178)
{noformat}

The reason this can happen has to do with the handling of legacy files. Legacy 
files are cell based while the 3.0 storage engine is primarily row based, so we 
group those cells into rows early in the deserialization process (in 
{{UnfilteredDeserializer.OldFormatDeserializer}}), but in doing so, we can only 
consider a row finished when we've either reach the end of the partition/file, 
or when we've read a cell that doesn't belong to that row.  That second case 
means that when the deserializer returns a given row, the underlying file 
pointer may actually not positioned at the end of that row, but rather it may 
be past the first cell of the next row (which the deserializer remembers for 
future use). Long story short, when we try to detect if we're logically past 
our current index block in 
{{AbstractIterator.IndexState#isPastCurrentBlock(}}), we can't simply rely on 
the file pointer, which again may be a bit more advanced that we logically are, 
and that's the reason for the "correction" in that method. That correction is 
really just the amount of bytes remembered but not yet used in the deserializer.

That "correction" is sometimes wrong however and that's due to the fact that in 
{{LegacyLayout#readLegacyAtom}}, if we get a cell for an dropped or removed 
cell, we ignore that cell (which, in itself, is fine). Problem is that this 
skipping is done within the {{LegacyLayout#readLegacyAtom}} method but without 
{{UnfilteredDeserializer.OldFormatDeserializer}} knowing about it. As such, the 
size of the skipped cell ends up being accounted in the "correction" bytes for 
the next cell we read. Lo and behold, if such cell for a removed/dropped column 
is both the last cell of a CQL row and just before an index boundary (pretty 
unlikely in general btw, but definitively possible), then the deserializer will 
count its size with the first cell of the next row, which happens to also be 
the first cell of the next index block.  And when the code then tries to figure 
out if it crossed an index boundary, it will over-correct. That is, the 
{{indexState.updateBlock()}} call at the start of 
{{SSTableIterator.ForwardIndexedReader#computeNext}} will not work correctly.  
This can then make the code return a row that is after the requested slice end 
(and should thus not be returned) because it doesn't compare that row to said 
requested end due to thinking it's not on the last index block to read, even 
though it genuinely is.

Anyway, the whole explanation is a tad complex, but the fix isn't: we need to 
move the skipping of cells for removed/dropped column a level up so the 
deserializer knows about it and don't silently count their size in the next 
atom size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to