[ https://issues.apache.org/jira/browse/HBASE-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973831#comment-13973831 ]
ramkrishna.s.vasudevan commented on HBASE-10801: ------------------------------------------------ I tested this patch with a minor modification of not passing the SeekerState to the KeyOnlyClonedSeekerState to have only the primitive member variables. (passing seekerstate was bit more costly). Combining this with HBASE-10929 and added a filter FilterAllFilter, that filters out every row that gets returned to the client. This ensures that the path of the scan there is no need for creating a KV object (which involves copying the value part also). So purely the comparison happens as only Cells. Note that in this patch the key part is copied in the shallowCopy(). Doing so with a full table scan with 1 thread over 2000000 rows resulted in this With patch ======== {code} hbase(main):002:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 9.6820 seconds hbase(main):003:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 2.8490 seconds hbase(main):004:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 2.7680 seconds hbase(main):005:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 2.5470 seconds {code} without patch ========= {code} hbase(main):002:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 19.4020 seconds hbase(main):003:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 6.1450 seconds hbase(main):004:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 2.8520 seconds hbase(main):005:0> scan 'TestTable',{FILTER=>org.apache.hadoop.hbase.filter.FilterAllFilter.new()} ROW COLUMN+CELL 0 row(s) in 2.6900 seconds {code} Used Performance Evaluation tool. So the length of value bytes is 1000 per row. So you could see when the experiment starts the scan almost takes 50% more time. But once the cache is fully loaded the scans are not too costly and the values even out with a small deviation. Changing the value size may impact much more than this. Can test with changing the value also and making it much more bigger. This change in the performance during the first scanning remains consistent. > Ensure DBE interfaces can work with Cell > ---------------------------------------- > > Key: HBASE-10801 > URL: https://issues.apache.org/jira/browse/HBASE-10801 > Project: HBase > Issue Type: Sub-task > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.99.0 > > Attachments: HBASE-10801.patch, HBASE-10801_1.patch, > HBASE-10801_2.patch, HBASE-10801_3.patch > > > Some changes to the interfaces may be needed for DBEs or may be the way it > works currently may be need to be modified inorder to make DBEs work with > Cells. Suggestions and ideas welcome. -- This message was sent by Atlassian JIRA (v6.2#6252)