[jira] [Updated] (ACCUMULO-403) Create general row selection iterator

Keith Turner (Updated) (JIRA) Wed, 15 Feb 2012 16:59:30 -0800

     [ 
https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Keith Turner updated ACCUMULO-403:
----------------------------------

    Description: 
The WholeRowIterator support filtering rows that meet a certain criteria.  
However it reads the entire row into memory.  It is possible to efficiently 
select rows w/o reading them into memory by using two iterators.  One iterator 
for selection, one for reading.  When its determined that a row is not needed 
using the selection iterator, then seek the read iterator over the row.  

This pattern could be made into an easy to use iterator that users extend.  The 
iterator could have an abstract method that user implement to decide if they 
want to select or filter a row.  Could look something like the following.


{noformat}

class RowSelectionIterator extends WrappingIterator {

   public abstract boolean selectRow(SortedKeyValueIterator row);

}

{noformat}


Below is a simple example of a row selection iterator that returns rows that 
have the columns foo and bar.


{noformat}

class FooBarRowSelector extends  RowSelectionIterator {
   public boolean selectRow(SortedKeyValueIterator row){
      
      Text row = row.getTopKey().getRow();
      //seek instead of scanning, this more efficient for large rows w/ lots of 
columns... 
      //if the row only has a few columns scanning is probably faster... also 
seeking the 
      //columns in sorted order is more efficient.
      row.seek(Range.exact(row, 'bar');
      boolean sawBar = row.hasTop();

      if(!sawBar)
        return false;

      row.seek(Range.exact(row, 'foo'));
      boolean sawFoo = row.hasTop();

      return sawFoo;
   }
}

{noformat}

  was:
The WholeRowIterator support filtering rows that meet a certain criteria.  
However it reads the entire row into memory.  It is possible to efficiently 
select rows w/o reading them into memory by using two iterators.  One iterator 
for selection, one for reading.  When its determined that a row is not needed 
using the selection iterator, then seek the read iterator over the row.  

This pattern could be made into an easy to use iterator that users extend.  The 
iterator could have an abstract method that user implement to decide if they 
want to select or filter a row.  Could look something like the following.


{noformat}

class RowSelectionIterator extends WrappingIterator {

   public abstract boolean selectRow(SortedKeyValueIterator row);

}

{noformat}


Below is a simple example of a row selection iterator that returns rows that 
have the columns foo and bar.


{noformat}

class FooBarRowSelector extends  RowSelectionIterator {
   public boolean selectRow(SortedKeyValueIterator row){
      
      Text row = row.getTopKey().getRow();
      //seek instead of scanning, this more efficient for large rows w/ lots of 
columns... 
      //if the row only has a few columns scanning is probably faster... also 
seeking the 
      //columns in sorted order is more efficient.
      row.seek(Range.exact(row, 'bar');
      boolean sawBar = row.hasTop();

      row.seek(Range.exact(row, 'foo'));
      boolean sawFoo = row.hasTop();

      return sawBar && sawFoo;
   }
}

{noformat}

    
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Billie Rinaldi
>             Fix For: 1.5.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  
> However it reads the entire row into memory.  It is possible to efficiently 
> select rows w/o reading them into memory by using two iterators.  One 
> iterator for selection, one for reading.  When its determined that a row is 
> not needed using the selection iterator, then seek the read iterator over the 
> row.  
> This pattern could be made into an easy to use iterator that users extend.  
> The iterator could have an abstract method that user implement to decide if 
> they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that 
> have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots 
> of columns... 
>       //if the row only has a few columns scanning is probably faster... also 
> seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ACCUMULO-403) Create general row selection iterator

Reply via email to