[
https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566404#comment-14566404
]
Anoop Sam John commented on HBASE-13448:
----------------------------------------
@larsh thanks for the comments
I was trying to explain why we won't see any improve as such in the test and
especially in 0.98. Sorry if I was not clearly saying.
Test have 1 CF and single file in that. Under StoreScanner KVHeap, we have only
single file always and there is no comparison happening and no calls to
getXXXOffset/Length there. There is get calls in StoreScanner (max 2 times)
and then in SQM also we need component offset/length. But in SQM we dont do
get calls on KeyValue to get offset/length. Instead we calculate there on
parsing KV buffer on our own. (See code below). Then SQM is skipping these
cells and so no further get calls on the cells. So in effect there is 2 times
get call on rowLength and just one time on others. This makes it clear why no
adv.
In a real case where Cells are not skipped (and in trunk especially) there are
many times call happen and mainly on rowLength. When ExplicitColTracker in
use, there are calls to qualifier offset/length also many times. For other
component length/offset, the keyLength is parsed frequently. If u see table in
above comments you can see how many times each call happen on a single Cell.
Those numbers are when cells are written back to client side so comes in all
layes. But in that test also I had only 1 CF and one HFile. So when this is
also getting more, there will be comparison op happening in 2 KVHeaps and so
the calls will be more. (We no longer pass the byte[], offset, length into
Comparators but instead pass Cell alone)
So in case of trunk there will be adv we would see.. If you can give us your
test, I will run it on trunk.
{code}
byte [] bytes = kv.getBuffer();
int offset = kv.getOffset();
int keyLength = Bytes.toInt(bytes, offset, Bytes.SIZEOF_INT);
offset += KeyValue.ROW_OFFSET;
int initialOffset = offset;
short rowLength = Bytes.toShort(bytes, offset, Bytes.SIZEOF_SHORT);
offset += Bytes.SIZEOF_SHORT;
int ret = this.rowComparator.compareRows(row, this.rowOffset,
this.rowLength,
bytes, offset, rowLength);
...
...
//Passing rowLength
offset += rowLength;
//Skipping family
byte familyLength = bytes [offset];
offset += familyLength + 1;
int qualLength = keyLength -
(offset - initialOffset) - KeyValue.TIMESTAMP_TYPE_SIZE;
long timestamp = Bytes.toLong(bytes, initialOffset + keyLength -
KeyValue.TIMESTAMP_TYPE_SIZE);
...
...
byte type = bytes[initialOffset + keyLength - 1];
...
MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength, type);
if (colChecker == MatchCode.INCLUDE) {
ReturnCode filterResponse = ReturnCode.SKIP;
// STEP 2: Yes, the column is part of the requested columns. Check if
filter is present
if (filter != null) {
// STEP 3: Filter the key value and return if it filters out
filterResponse = filter.filterKeyValue(kv);
{code}
> New Cell implementation with cached component offsets/lengths
> -------------------------------------------------------------
>
> Key: HBASE-13448
> URL: https://issues.apache.org/jira/browse/HBASE-13448
> Project: HBase
> Issue Type: Sub-task
> Components: Scanners
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch,
> HBASE-13448_V3.patch, gc.png, hits.png
>
>
> This can be extension to KeyValue and can be instantiated and used in read
> path.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)