[ 
https://issues.apache.org/jira/browse/CASSANDRA-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8747:
--------------------------------
    Description: 
Currently openEarly does some fairly ugly looping back over the summary data 
we've collected looking for one we think should be fully covered in the Index 
and Data files, and that should have a safe boundary between it and the end of 
an IndexSummary entry so that when scanning across it we should not 
accidentally read an incomplete key. The approach taken is a little difficult 
to reason about though, and be confident of, and I now realise is also very 
subtly broken. Since we're cleaning up the behaviour around this code, it 
seemed worthwhile to improve its clarity and make its behaviour easier to 
reason about. The current behaviour can be characterised as:

# Take the current Index file length
# Find the IndexSummary boundary key (first key in an interval) that starts 
past this position
# Take the IndexSummary boundary key (first key) for the preceding interval as 
our initial boundary
# Construct a reader with this boundary
# Lookup our last key in the reader, and if its end position is past the end of 
the data file, take the prior summary boundary. Repeat until we find one 
starting before the end.

The bug may well be very hard to exhibit, or even impossible, but is that if we 
have a single very large partition followed by 127 very tiny partitions (or 
whatever the IndexSummary interval is configured as), our IndexSummary interval 
buffer may not guarantee the record we have selected as our end is fully 
readable.

The new approach is to track in the IndexSummary the safe and optimal boundary 
point (i.e. the last record in each summary interval) and its bounds in the 
index and data files. On flushing either file, we notify the summary builder to 
the new flush points, and it consults its map of these and selects the last 
such boundary that can safely be read in both. This is much easier to 
understand, and has no such subtle risk.

  was:
Currently openEarly does some fairly ugly looping back over the summary data 
we've collected looking for one we think should be fully covered in the Index 
and Data files, and that should have a safe boundary between it and the end of 
an IndexSummary entry so that when scanning across it we should not 
accidentally read an incomplete key. The approach taken is a little difficult 
to reason about though, and be confident of. Since we're cleaning up the 
behaviour around this code, it seemed worthwhile to improve its clarity and 
make its behaviour easier to reason about. The current behaviour can be 
characterised as:

Find the first summary record

       Priority: Major  (was: Minor)
     Issue Type: Bug  (was: Improvement)
        Summary: Make SSTableWriter.openEarly behaviour more robust  (was: Make 
SSTableWriter.openEarly behaviour more obvious)

> Make SSTableWriter.openEarly behaviour more robust
> --------------------------------------------------
>
>                 Key: CASSANDRA-8747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8747
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>
>
> Currently openEarly does some fairly ugly looping back over the summary data 
> we've collected looking for one we think should be fully covered in the Index 
> and Data files, and that should have a safe boundary between it and the end 
> of an IndexSummary entry so that when scanning across it we should not 
> accidentally read an incomplete key. The approach taken is a little difficult 
> to reason about though, and be confident of, and I now realise is also very 
> subtly broken. Since we're cleaning up the behaviour around this code, it 
> seemed worthwhile to improve its clarity and make its behaviour easier to 
> reason about. The current behaviour can be characterised as:
> # Take the current Index file length
> # Find the IndexSummary boundary key (first key in an interval) that starts 
> past this position
> # Take the IndexSummary boundary key (first key) for the preceding interval 
> as our initial boundary
> # Construct a reader with this boundary
> # Lookup our last key in the reader, and if its end position is past the end 
> of the data file, take the prior summary boundary. Repeat until we find one 
> starting before the end.
> The bug may well be very hard to exhibit, or even impossible, but is that if 
> we have a single very large partition followed by 127 very tiny partitions 
> (or whatever the IndexSummary interval is configured as), our IndexSummary 
> interval buffer may not guarantee the record we have selected as our end is 
> fully readable.
> The new approach is to track in the IndexSummary the safe and optimal 
> boundary point (i.e. the last record in each summary interval) and its bounds 
> in the index and data files. On flushing either file, we notify the summary 
> builder to the new flush points, and it consults its map of these and selects 
> the last such boundary that can safely be read in both. This is much easier 
> to understand, and has no such subtle risk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to