Christopher Tubbs created ACCUMULO-775:
------------------------------------------

             Summary: Optimize iterator seek() method when seeking forward
                 Key: ACCUMULO-775
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-775
             Project: Accumulo
          Issue Type: Improvement
          Components: tserver
            Reporter: Christopher Tubbs
            Assignee: Keith Turner
             Fix For: 1.5.0


At present, seeking is a very expensive operation. Yet, it is a very common 
case, especially when writing filtering/consuming/skipping iterators to seek to 
the next possible match (perhaps in the next row, when matching a column family 
with a regular expression), rather than continuing to iterate. A common 
solution is to continue to scan for some threshold (~10-20 entries), hoping to 
just "run into" the next possible match, rather than waste resources seeking 
directly to it.

This pattern can be rolled in to the lower level iterator, so that iterators on 
top don't have to do this. They can seek, and the underlying source iterator 
can simply consume the next X entries when it makes sense, rather than waste 
resources seeking.

I could be wrong (please comment and correct me below if I am), but I imagine 
that the places where this would make the most sense is if the data currently 
being sought (seek'd) is in the current compressed block from the underlying 
file, especially if it is forward, relative to the current pointer. A better 
seek method should be able to tell where one currently is, and whether the 
requested data is within reach without doing all the expensive operations to 
re-seek to the same compressed block that is already loaded, reload it, 
decompress it, and scan to the requested starting point.

Having such an optimization would eliminate the need for users to try to 
calibrate their own such scan vs. seek optimization based on guessing whether 
their data is in the current block or another one, while still getting that 
same performance benefit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to