[ https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557153#comment-14557153 ]
Lars Hofhansl commented on HBASE-13448: --------------------------------------- My test is quite specific in that the entire scan happens on the region server, because all Cells are filtered there. I do this in order to find out how much overhead the server has. It's possible that if the Cells would not be filtered and more calls to getRowLength would happen. I have not specifically tracked GC activity. I ran the test many times in a loop, first warming up the region server a few times, then running it a few time in order to capture some GC activity in the run times. My main comment stands: Just because we call getRowLength a bunch, or a profiler says it's inefficient, doesn't mean it's bad. Only a real test can bear that out. For this case it's best (I think) to test with just a single region server to keep network variance out of the picture (and this is a region server local optimization anyway). I don't know how to explain the numbers, yet. It is possible that reading the length from a member leads to less efficient cache line utilization compared to decoding it from the byte[] each time... That would heavily depend on the specific call sequence. Lemme try with only caching the row key. > New Cell implementation with cached component offsets/lengths > ------------------------------------------------------------- > > Key: HBASE-13448 > URL: https://issues.apache.org/jira/browse/HBASE-13448 > Project: HBase > Issue Type: Sub-task > Components: Scanners > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch, > HBASE-13448_V3.patch, gc.png, hits.png > > > This can be extension to KeyValue and can be instantiated and used in read > path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)