[ 
https://issues.apache.org/jira/browse/CASSANDRA-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-2062:
--------------------------------

    Description: 
The core reason for this ticket is to gain control over the consumption of the 
lazy nested iterators in the read path.
{quote}We survive now because we write the size of the row at the front of the 
row (via some serious acrobatics at write time), which gives us hasNext() for 
rows for free. But it became apparent while working on the block-based format 
that hasNext() will not be cheap unless the current item has been consumed. 
"Consumption" of the row is easy, and blocks will be framed so that they can be 
very easily skipped, but you don't want to have to seek to the end of the row 
to answer hasNext, and then seek back to the beginning to consume the row, 
which is what CollatingIterator would have forced us to do.{quote}

While we're at it, we can also improve efficiency: for {{M}} iterators 
containing {{N}} total items, commons.collections.CollatingIterator performs a 
{{O(M*N)}} merge, and calls hasNext multiple times per returned value. We can 
do better.

  was:For {{M}} iterators containing {{N}} total items, 
commons.collections.CollatingIterator performs a {{O(M*N)}} merge, and calls 
hasNext multiple times per returned value. We can do better.

        Summary: Better control of iterator consumption  (was: Use more 
efficient merge algorithm)

Edited the description/title to indicate that there is a core issue at stake 
here, rather than just an algorithm change.

> Better control of iterator consumption
> --------------------------------------
>
>                 Key: CASSANDRA-2062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2062
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Minor
>             Fix For: 0.7.2
>
>         Attachments: 0001-Improved-iterator-for-merging-sorted-iterators.txt, 
> 0002-Quickie-instrumentation-for-comparisons.txt, 
> 0003-Replace-Collating-with-Merge-in-CompactionIterator.txt, 
> 0004-Port-LazilyCompactedRow-ReducingKeyIterator-RangeSlice.txt, 
> 0005-Remove-temporary-instrumentation-and-CollatingIterator.txt, 
> 0006-Port-RowIterator.txt
>
>
> The core reason for this ticket is to gain control over the consumption of 
> the lazy nested iterators in the read path.
> {quote}We survive now because we write the size of the row at the front of 
> the row (via some serious acrobatics at write time), which gives us hasNext() 
> for rows for free. But it became apparent while working on the block-based 
> format that hasNext() will not be cheap unless the current item has been 
> consumed. "Consumption" of the row is easy, and blocks will be framed so that 
> they can be very easily skipped, but you don't want to have to seek to the 
> end of the row to answer hasNext, and then seek back to the beginning to 
> consume the row, which is what CollatingIterator would have forced us to 
> do.{quote}
> While we're at it, we can also improve efficiency: for {{M}} iterators 
> containing {{N}} total items, commons.collections.CollatingIterator performs 
> a {{O(M*N)}} merge, and calls hasNext multiple times per returned value. We 
> can do better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to