[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818068#comment-13818068
 ] 

Lars Hofhansl commented on HBASE-9935:
--

There are some more cases like this in lazy seeking. Also some of the methods 
in KeyValue do similar things.


> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818117#comment-13818117
 ] 

Jean-Marc Spaggiari commented on HBASE-9935:


Nice one again. We should take more attention when we call external methodes on 
the same object multiple time and see if we are not better to do it once and 
store the result.

One improvement we can do here, is to store the row lenght and invalidate it 
only if it changes? Like

{code}
@Override
public short getRowLength() {
  if (rowLength == null)
rowLength = Bytes.toShort(this.bytes, getKeyOffset());
  return rowLength;
}
{code}

And then each time we modify this.offset or this.byte we invalidate 
rowLenght... So that will optimize everywhere even if we missed it? Looking 
very quickly at the KV code, I have not seen anywhere except in the constructor 
where this is modified. So might be a quick and small optimization again...

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818222#comment-13818222
 ] 

Andrew Purtell commented on HBASE-9935:
---

Nice, I've been perpetuating this antipattern in some recent changes. Let's go 
look...

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818256#comment-13818256
 ] 

Lars Hofhansl commented on HBASE-9935:
--

[~jmspaggi], caching stuff because of a profiler session is itself an anti 
pattern. See HBASE-7279 where I removed a bunch of stuff that we cached. Now, 
we never cached the rowLength before and it is only short, so it might be OK.
I also want rationalize all these static methods we have on KeyValue. The 
particular one I quote in the description should be an object method - and 
interestingly there *is* a method on KeyValue that does exactly that call 
createLastOnRowCol. If we use these or similar methods everywhere we can hide 
the KeyValue specific optimizations inside KeyValue while still not requiring a 
cache of the rowLength.


> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818272#comment-13818272
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Meant to put a :) after "anti pattern" above.

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818282#comment-13818282
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Looking at KeyValue. The overall static size is still 68 bytes. 4 of which are 
actually to cache the keyLength already. Adding 2 bytes to that might well 
worth the effort. (Right now I am trying what happens when I *remove* 4 byte 
cache for the keyLength).

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818287#comment-13818287
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Interestingly, when I remove the keyLength caching I see 
KeyValue.getKeyLength() prominently in the profiler, but I can't measure any 
performance detriment.
When I switch to a sampling profiler, I see the getRowLength() or 
getKeyLength() almost never sees a hit. So this caching (and even the reason 
why I filed this jira) might just be a profiler anomaly.

Maybe somebody from Intel ([~apurtell], hint hint) could have a look? If the 
keyLength caching does not buy us much, we should remove it and save the memory.

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818605#comment-13818605
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Did more some tests.
The steady state of this is even improved when I remove the keyLength caching 
from KeyValue. I my case I do full scans through 25m KVs, so in each run the 
extra keyLength cache produces 100mb of extra garbage to be collected.
I also observed a slight slowdown (mostly within the noise, though) in this 
scenario when I do the 2 byte rowLength caching.

So my proposal is this:
# remove the keyLength caching
# look through the callers of getFamilyOffset/getFamilyLength, etc, and see 
where we can optimize this while hiding all of this in the KeyValue class
# no cache for the rowLength


> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818625#comment-13818625
 ] 

Lars Hofhansl commented on HBASE-9935:
--

I tried a big'ish patch. Didn't see any improvements. The JVM is probably smart 
enough to do the right thing anyway. Unless somebody has some new ideas, I'll 
close this as invalid (or maybe just remove the keyLenght caching).


> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-11 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818957#comment-13818957
 ] 

Jean-Marc Spaggiari commented on HBASE-9935:


Might be good for the GC time. Also, nice small improvement for 
KeyValue.ROW_OFFSET move. But as you said, performance change is most probably 
in the noise.

Since you are doing modifications to KeyValue, can you add the implement 
Cloneable since we have a clone() method? And the @Override keyword for equals 
and hashCode? ;)

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 9935-0.94.txt
>
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819172#comment-13819172
 ] 

Lars Hofhansl commented on HBASE-9935:
--

He yes can do :)
I'll probably file another issue for the proposed KV changes and close this one.
(will also do a test on a machine with more cores, where CPU cycles are cheaper 
and memory bandwidth is more of a limited commodity)

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 9935-0.94.txt
>
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819408#comment-13819408
 ] 

stack commented on HBASE-9935:
--

+1 on removing caching of a keylength that provides no discernible benefit (I 
probably added it after a 'profiling' session -- or at least added an ancestor 
of current KV length caching).

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 9935-0.94.txt
>
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819527#comment-13819527
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Lemme do a bit more testing. I'll report back as soon as I have done that. In 
either case it won't be a big perf improvement.

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 9935-0.94.txt
>
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819682#comment-13819682
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Tried on a 6 core (12 hw threads) machine. Inconclusive as before. Variance of 
these test is quite hight (+- 5% runtime).
I'd say we remove the caching and reclaim those 4 bytes.

This issue itself was dud (sorry about that, it's easy to get fooled by the 
profiler :) ).

I'll close this and open a new one for the keyLength removal.


> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 9935-0.94.txt
>
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9935) Slight perf improvement: Avoid KeyValue.getRowLength() at some places

2013-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819689#comment-13819689
 ] 

Lars Hofhansl commented on HBASE-9935:
--

Filed HBASE-9956.

> Slight perf improvement: Avoid KeyValue.getRowLength() at some places
> -
>
> Key: HBASE-9935
> URL: https://issues.apache.org/jira/browse/HBASE-9935
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 9935-0.94.txt
>
>
> Here's an example:
> {code}
> KeyValue.createLastOnRow(
>   kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(),
>   kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength(),
>kv.getBuffer(), kv.getQualifierOffset(), kv.getQualifierLength());
> {code}
> Looks harmless enough, but that actually recalculates the rowlength 5 times. 
> And each time it needs to decode the rowlength again from the bytes of the KV.



--
This message was sent by Atlassian JIRA
(v6.1#6144)