[
https://issues.apache.org/jira/browse/HBASE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506088#comment-14506088
]
stack commented on HBASE-13496:
-------------------------------
You make a good point [~apurtell] Lets just use this going forward for all
compares.
Let me attach [~anoop.hbase] 's code redone for JMH.
First the results show unsafe compare against offheap buffer is is 30% faster
than unsafe against an onheap BB array.
This does not have attached patch
{code}
kalashnikov:onheapoffheapcompare stack$ java -jar target/benchmarks.jar
net.duboce.OnheapOffheapCompare.offheap -wi 3 -wmb 3 -wf 3 -i 3 -f 3
# JMH 1.9 (released 4 days ago)
# VM invoker:
/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 3 iterations, 1 s each
# Measurement: 3 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: net.duboce.OnheapOffheapCompare.offheap
# Run progress: 0.00% complete, ETA 00:00:36
# Warmup Fork: 1 of 3
# Warmup Iteration 1: 40405814.674 ops/s
# Warmup Iteration 2: 39543540.514 ops/s
# Warmup Iteration 3: 41606902.264 ops/s
Iteration 1: 38586392.256 ops/s
Iteration 2: 40288847.812 ops/s
Iteration 3: 39909092.296 ops/s
# Run progress: 16.67% complete, ETA 00:00:31
# Warmup Fork: 2 of 3
# Warmup Iteration 1: 38922871.181 ops/s
# Warmup Iteration 2: 41366404.618 ops/s
# Warmup Iteration 3: 40964570.763 ops/s
Iteration 1: 39633298.954 ops/s
Iteration 2: 40079375.963 ops/s
Iteration 3: 39220530.393 ops/s
# Run progress: 33.33% complete, ETA 00:00:25
# Warmup Fork: 3 of 3
# Warmup Iteration 1: 40851101.637 ops/s
# Warmup Iteration 2: 40026443.875 ops/s
# Warmup Iteration 3: 40233304.903 ops/s
Iteration 1: 38803789.321 ops/s
Iteration 2: 38164635.144 ops/s
Iteration 3: 38628486.154 ops/s
# Run progress: 50.00% complete, ETA 00:00:18
# Fork: 1 of 3
# Warmup Iteration 1: 38389355.584 ops/s
# Warmup Iteration 2: 40670449.399 ops/s
# Warmup Iteration 3: 39692011.027 ops/s
Iteration 1: 39972794.821 ops/s
Iteration 2: 39871969.084 ops/s
Iteration 3: 39274714.589 ops/s
# Run progress: 66.67% complete, ETA 00:00:12
# Fork: 2 of 3
# Warmup Iteration 1: 39579020.449 ops/s
# Warmup Iteration 2: 40040917.951 ops/s
# Warmup Iteration 3: 41771914.576 ops/s
Iteration 1: 39036428.655 ops/s
Iteration 2: 40680623.481 ops/s
Iteration 3: 41386782.492 ops/s
# Run progress: 83.33% complete, ETA 00:00:06
# Fork: 3 of 3
# Warmup Iteration 1: 39278019.575 ops/s
# Warmup Iteration 2: 39617906.299 ops/s
# Warmup Iteration 3: 40959272.421 ops/s
Iteration 1: 39808506.268 ops/s
Iteration 2: 41423402.879 ops/s
Iteration 3: 40484950.136 ops/s
Result "offheap":
40215574.712 ±(99.9%) 1423136.708 ops/s [Average]
(min, avg, max) = (39036428.655, 40215574.712, 41423402.879), stdev =
846885.827
CI (99.9%): [38792438.004, 41638711.419] (assumes normal distribution)
# Run complete. Total time: 00:00:37
Benchmark Mode Cnt Score Error Units
OnheapOffheapCompare.offheap thrpt 9 40215574.712 ± 1423136.708 ops/s
kalashnikov:onheapoffheapcompare stack$ java -jar target/benchmarks.jar
net.duboce.OnheapOffheapCompare.onheap -wi 3 -wmb 3 -wf 3 -i 3 -f 3
# JMH 1.9 (released 4 days ago)
# VM invoker:
/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 3 iterations, 1 s each
# Measurement: 3 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: net.duboce.OnheapOffheapCompare.onheap
# Run progress: 0.00% complete, ETA 00:00:36
# Warmup Fork: 1 of 3
# Warmup Iteration 1: 31867510.027 ops/s
# Warmup Iteration 2: 30028630.589 ops/s
# Warmup Iteration 3: 33687633.869 ops/s
Iteration 1: 32222065.627 ops/s
Iteration 2: 33131719.286 ops/s
Iteration 3: 31564166.944 ops/s
# Run progress: 16.67% complete, ETA 00:00:31
# Warmup Fork: 2 of 3
# Warmup Iteration 1: 31422485.005 ops/s
# Warmup Iteration 2: 31002717.261 ops/s
# Warmup Iteration 3: 33718561.788 ops/s
Iteration 1: 33404312.160 ops/s
Iteration 2: 33564603.329 ops/s
Iteration 3: 30730618.344 ops/s
# Run progress: 33.33% complete, ETA 00:00:25
# Warmup Fork: 3 of 3
# Warmup Iteration 1: 31976230.984 ops/s
# Warmup Iteration 2: 31469206.976 ops/s
# Warmup Iteration 3: 33966111.804 ops/s
Iteration 1: 34501343.952 ops/s
Iteration 2: 33884137.262 ops/s
Iteration 3: 31021199.684 ops/s
# Run progress: 50.00% complete, ETA 00:00:18
# Fork: 1 of 3
# Warmup Iteration 1: 31339790.845 ops/s
# Warmup Iteration 2: 31098331.023 ops/s
# Warmup Iteration 3: 33886103.423 ops/s
Iteration 1: 34070557.543 ops/s
Iteration 2: 33340208.663 ops/s
Iteration 3: 30380920.160 ops/s
# Run progress: 66.67% complete, ETA 00:00:12
# Fork: 2 of 3
# Warmup Iteration 1: 30590920.526 ops/s
# Warmup Iteration 2: 32820049.296 ops/s
# Warmup Iteration 3: 33060247.452 ops/s
Iteration 1: 33865093.227 ops/s
Iteration 2: 33263291.673 ops/s
Iteration 3: 33925533.481 ops/s
# Run progress: 83.33% complete, ETA 00:00:06
# Fork: 3 of 3
# Warmup Iteration 1: 29096911.911 ops/s
# Warmup Iteration 2: 30985824.533 ops/s
# Warmup Iteration 3: 34356926.601 ops/s
Iteration 1: 33809683.353 ops/s
Iteration 2: 33301106.606 ops/s
Iteration 3: 33173433.914 ops/s
Result "onheap":
33236647.624 ±(99.9%) 1885165.178 ops/s [Average]
(min, avg, max) = (30380920.160, 33236647.624, 34070557.543), stdev =
1121831.559
CI (99.9%): [31351482.446, 35121812.803] (assumes normal distribution)
# Run complete. Total time: 00:00:37
Benchmark Mode Cnt Score Error Units
OnheapOffheapCompare.onheap thrpt 9 33236647.624 ± 1885165.178 ops/s
{code}
Here is the patched version.
{code}
kalashnikov:onheapoffheapcompare stack$ java -jar target/benchmarks.jar
net.duboce.OnheapOffheapCompare.offheap -wi 3 -wmb 3 -wf 3 -i 3 -f 3
# JMH 1.9 (released 4 days ago)
# VM invoker:
/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 3 iterations, 1 s each
# Measurement: 3 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: net.duboce.OnheapOffheapCompare.offheap
# Run progress: 0.00% complete, ETA 00:00:36
# Warmup Fork: 1 of 3
# Warmup Iteration 1: 38705824.349 ops/s
# Warmup Iteration 2: 40968480.128 ops/s
# Warmup Iteration 3: 40282889.540 ops/s
Iteration 1: 40702237.123 ops/s
Iteration 2: 42617568.994 ops/s
Iteration 3: 41598589.465 ops/s
# Run progress: 16.67% complete, ETA 00:00:31
# Warmup Fork: 2 of 3
# Warmup Iteration 1: 40784095.045 ops/s
# Warmup Iteration 2: 40483383.471 ops/s
# Warmup Iteration 3: 37572518.311 ops/s
Iteration 1: 39560674.538 ops/s
Iteration 2: 40555672.357 ops/s
Iteration 3: 40880540.757 ops/s
# Run progress: 33.33% complete, ETA 00:00:25
# Warmup Fork: 3 of 3
# Warmup Iteration 1: 39862891.631 ops/s
# Warmup Iteration 2: 40261276.539 ops/s
# Warmup Iteration 3: 39848202.244 ops/s
Iteration 1: 38678248.714 ops/s
Iteration 2: 39254253.795 ops/s
Iteration 3: 39351040.172 ops/s
# Run progress: 50.00% complete, ETA 00:00:18
# Fork: 1 of 3
# Warmup Iteration 1: 39136106.519 ops/s
# Warmup Iteration 2: 39707432.443 ops/s
# Warmup Iteration 3: 40208894.012 ops/s
Iteration 1: 39335541.854 ops/s
Iteration 2: 40391095.986 ops/s
Iteration 3: 41930971.302 ops/s
# Run progress: 66.67% complete, ETA 00:00:12
# Fork: 2 of 3
# Warmup Iteration 1: 40747764.072 ops/s
# Warmup Iteration 2: 38806087.810 ops/s
# Warmup Iteration 3: 39532335.320 ops/s
Iteration 1: 39994729.179 ops/s
Iteration 2: 39809095.219 ops/s
Iteration 3: 40985424.095 ops/s
# Run progress: 83.33% complete, ETA 00:00:06
# Fork: 3 of 3
# Warmup Iteration 1: 39917919.237 ops/s
# Warmup Iteration 2: 40428549.733 ops/s
# Warmup Iteration 3: 40628153.354 ops/s
Iteration 1: 38728312.063 ops/s
Iteration 2: 40151049.176 ops/s
Iteration 3: 40688979.137 ops/s
Result "offheap":
40223910.890 ±(99.9%) 1571242.520 ops/s [Average]
(min, avg, max) = (38728312.063, 40223910.890, 41930971.302), stdev =
935021.221
CI (99.9%): [38652668.370, 41795153.410] (assumes normal distribution)
kalashnikov:onheapoffheapcompare stack$ java -jar target/benchmarks.jar
net.duboce.OnheapOffheapCompare.onheap -wi 3 -wmb 3 -wf 3 -i 3 -f 3
# JMH 1.9 (released 4 days ago)
# VM invoker:
/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 3 iterations, 1 s each
# Measurement: 3 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: net.duboce.OnheapOffheapCompare.onheap
# Run progress: 0.00% complete, ETA 00:00:36
# Warmup Fork: 1 of 3
# Warmup Iteration 1: 36335116.201 ops/s
# Warmup Iteration 2: 37110644.116 ops/s
# Warmup Iteration 3: 33671084.279 ops/s
Iteration 1: 34677255.413 ops/s
Iteration 2: 32881045.678 ops/s
Iteration 3: 35990860.149 ops/s
# Run progress: 16.67% complete, ETA 00:00:31
# Warmup Fork: 2 of 3
# Warmup Iteration 1: 35573712.462 ops/s
# Warmup Iteration 2: 35603905.398 ops/s
# Warmup Iteration 3: 34639558.917 ops/s
Iteration 1: 34095635.010 ops/s
Iteration 2: 34425913.095 ops/s
Iteration 3: 35715426.857 ops/s
# Run progress: 33.33% complete, ETA 00:00:25
# Warmup Fork: 3 of 3
# Warmup Iteration 1: 34966707.523 ops/s
# Warmup Iteration 2: 35676214.007 ops/s
# Warmup Iteration 3: 33693682.775 ops/s
Iteration 1: 32957484.472 ops/s
Iteration 2: 33809096.037 ops/s
Iteration 3: 35925135.387 ops/s
# Run progress: 50.00% complete, ETA 00:00:18
# Fork: 1 of 3
# Warmup Iteration 1: 35175304.900 ops/s
# Warmup Iteration 2: 34994155.480 ops/s
# Warmup Iteration 3: 34688793.076 ops/s
Iteration 1: 33979408.010 ops/s
Iteration 2: 35760999.472 ops/s
Iteration 3: 35674549.834 ops/s
# Run progress: 66.67% complete, ETA 00:00:12
# Fork: 2 of 3
# Warmup Iteration 1: 34835124.860 ops/s
# Warmup Iteration 2:
36552517.432 ops/s
# Warmup Iteration 3: 34668159.076 ops/s
Iteration 1:
35550437.413 ops/s
Iteration 2: 34841030.522 ops/s
Iteration 3: 34978397.580 ops/s
# Run progress: 83.33% complete, ETA 00:00:06
# Fork: 3 of 3
# Warmup Iteration 1: 33884306.332 ops/s
# Warmup Iteration 2: 35024086.104 ops/s
# Warmup Iteration 3: 34183938.730 ops/s
Iteration 1: 34133219.169 ops/s
Iteration 2: 32109732.886 ops/s
Iteration 3: 31063256.007 ops/s
Result "onheap":
34232336.766 ±(99.9%) 2767851.257 ops/s [Average]
(min, avg, max) = (31063256.007, 34232336.766, 35760999.472), stdev =
1647103.886
CI (99.9%): [31464485.509, 37000188.022] (assumes normal distribution)
# Run complete. Total time: 00:00:37
Benchmark Mode Cnt Score Error Units
OnheapOffheapCompare.onheap thrpt 9 34232336.766 ± 2767851.257 ops/s
{code}
So, the patch only effects the onheap compare (if I read it right):
OnheapOffheapCompare.onheap thrpt 9 33236647.624 ± 1885165.178 ops/s
vs
OnheapOffheapCompare.onheap thrpt 9 34232336.766 ± 2767851.257 ops/s
which is an improvement of about ~3%... (there is more variability.... but
maybe this would quell if I ran more iterations)
> Make Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo inlineable
> -----------------------------------------------------------------------------
>
> Key: HBASE-13496
> URL: https://issues.apache.org/jira/browse/HBASE-13496
> Project: HBase
> Issue Type: Sub-task
> Components: Scanners
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Fix For: 2.0.0, 1.2.0
>
> Attachments: ByteBufferUtils.java, HBASE-13496.patch,
> OffheapVsOnHeapCompareTest.java
>
>
> While testing with some other perf comparisons I have noticed that the above
> method (which is very hot in read path) is not getting inline
> bq.@ 16
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
> (364 bytes) hot method too big
> We can do minor refactoring to make it inlineable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)