Re: Review Request: Reseeking directly to required columns

Ryan Rawson Wed, 29 Sep 2010 13:37:46 -0700


On 2010-09-28 18:52:19, Pranav Khaitan wrote:
> > Ryan: 
> > 
> > Additionally, as part of the commit, you added the optimization for 
> > SEEK_NEXT_ROW. Had a question on the getKeyForNextRow() function:
> > 
> > +
> > +  public KeyValue getKeyForNextRow(KeyValue kv) {
> > +    return KeyValue.createLastOnRow(
> > +        kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
> > +        null, 0, 0,
> > +        null, 0, 0);
> > +  }
> > 
> > Is a KeyValue constructured with null column family & qualifier is indeed 
> > larger than all KeyValues in that row? Just want to make sure it doesn't 
> > reseek back to the very top of the current row :). [Note: I haven't spent 
> > time trying to confirm this; but was concerned that the null column family 
> > & qualifier might end up causing this KV to be smaller than the other KVs 
> > for the row. Will try and test it out to confirm.]
> > 
> >


this code in the comparator implements last key on row:

// compare row code here

      if (lcolumnlength == 0 && ltype == Type.Minimum.getCode()) {
        return 1; // left is bigger.
      }
      if (rcolumnlength == 0 && rtype == Type.Minimum.getCode()) {
        return -1;
      }

// rest of comparator here

If the right column has a length of 0 (ie: was constructed w/null family & 
qualifier) _and_ has type of Minimum, then we say that the left is smaller than 
the right, and vice versa, so in this code in HFile:

          int comp = this.reader.comparator.compare(key, offset, length,
            block.array(), block.arrayOffset() + block.position(), klen);

The 'key' is the target key (the 'last on row' key).  So we'd hit 'left is 
bigger' branch, and we would iterate past the entire row until we get to the 
next row.


- Ryan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/781/#review1350
-----------------------------------------------------------


On 2010-09-16 00:57:12, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/781/
> -----------------------------------------------------------
> 
> (Updated 2010-09-16 00:57:12)
> 
> 
> Review request for hbase, stack, Jonathan Gray, Karthik Ranganathan, and 
> Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> Optimize reads for specific columns by reseeking between scans. Use the 
> reseek logic to jump directly to next required column rather than reading 
> current column.
> 
> Big performance gain for queries with sparse columns. Not advantageous for 
> dense ones. Consider this before comitting.
> 
> Further suggestions/questions are welcome!
> 
> 
> This addresses bugs HBASE-2450, HBASE-2916 and HBASE-2959.
>     http://issues.apache.org/jira/browse/HBASE-2450
>     http://issues.apache.org/jira/browse/HBASE-2916
>     http://issues.apache.org/jira/browse/HBASE-2959
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java 990674 
>   trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 990674 
>   
> trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
>  990674 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
> 990674 
> 
> Diff: http://review.cloudera.org/r/781/diff
> 
> 
> Testing
> -------
> 
> All existing tests pass and make significant use of this code. 
> 
> Added a new test file called TestColumnSeeking along with another patch at 
> https://review.cloudera.org/r/780/.
> 
> 
> Thanks,
> 
> Pranav
> 
>

Re: Review Request: Reseeking directly to required columns

Reply via email to