[ 
https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887558#action_12887558
 ] 

HBase Review Board commented on HBASE-2265:
-------------------------------------------

Message from: "Pranav Khaitan" <pranavkhai...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/257/
-----------------------------------------------------------

(Updated 2010-07-12 14:48:59.626469)


Review request for hbase, Nicolas, Jonathan Gray, Ryan Rawson, Karthik 
Ranganathan, and Kannan Muthukkaruppan.


Summary
-------

Every memstore and store file will have a minimum and maximum timestamp 
associated with it. If the range of timestamps we are searching for doesn't 
overlap with the range for a particular file, we can skip searching it and save 
time.

Would significantly improve the performance for timestamp range queries. 
Particularly useful when most of the reads are for recent entries and the older 
files can be safely skipped. 

Addresses HBASE-2265 JIRA. 

This diff includes fixing some minor bugs like KeyValueHeap used to throw an 
uncaught exception when size of scanner set was zero. 

Internal review done by Jonathan and Kannan.


This addresses bug HBASE-2265.
    http://issues.apache.org/jira/browse/HBASE-2265


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 
959782 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
959782 
  
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
959782 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
959782 
  
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java 
PRE-CREATION 
  
trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java 
PRE-CREATION 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 
960082 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 
959782 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
959782 

Diff: http://review.hbase.org/r/257/diff


Testing
-------

All existing JUnit tests run successfully. More JUnit tests for Memstore, 
StoreFile and Store added to test correctness with multiple timestamps.

Conducted a test to measure the extra time required to keep track of min and 
max timestamps while writing KeyValues.  The comparison was done by entering 1 
Million KeyValues into memstore ten times with and without timestamp tracking 
and then taking the average time for each of them.  WAL was disabled and no 
flushing was done during this test to minimize overheads. The average time 
taken for entering 1M KeyValues into memstore without keeping track of 
timestamp was 13.44 seconds while the average time when keeping track of 
timestamps was 13.45 seconds. This shows that no significant overhead has been 
added while keeping track of timestamps.


Thanks,

Pranav




> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>            Assignee: Pranav Khaitan
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have 
> HFile and Memstore track their maximum and minimum timestamps. This has the 
> following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp 
> X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, 
> the current fast behavior of get can be maintained for those who use strictly 
> increasing timestamps, but "correct" behavior for those who sometimes write 
> out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide 
> which cell wins, even if the timestamp of the cells is equal. In essence, 
> rather than comparing timestamps, instead you are able to compare tuples of 
> (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage 
> A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to