[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-12-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749571#comment-15749571
 ] 

stack commented on HBASE-16146:
---

Ok I backport this one [~busbey]?

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Gary Helmling
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16146.001.patch, HBASE-16146.branch-1.001.patch, 
> HBASE-16146.branch-1.3.001.patch, counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-09-22 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514741#comment-15514741
 ] 

Gary Helmling commented on HBASE-16146:
---

We've seen Counter come up as a source of high CPU utilization in 1.3, 
especially since the switch of metrics to use FastLongHistogram (each instance 
of which uses 260 Counter instances internally) from HBASE-15222.  I think this 
is due to the use of the instance-level ThreadLocal in Counter to track the 
per-thread cell index, as perf output on hot nodes shows a huge amount of time 
in ThreadLocalMap.getEntryAfterMiss().  As the number of Counter instances (and 
ThreadLocal instances) we're retaining in memory grows, performance seems to 
degrade.

This is all moot for master, since we've already deprecated Counter and 
replaced its usage with LongAdder.  But we still need a solution for Counter in 
branch-1.  I'm testing a patch which removes the ThreadLocal usage, which I'll 
attach here.  Benchmarking shows a small reduction in Counter performance, but 
a big improvement in FastLongHistogram performance when many histograms are 
retained in memory.

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-09-22 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514767#comment-15514767
 ] 

Gary Helmling commented on HBASE-16146:
---

[~stack], I'm curious if this patch provides any improvement for your YCSB 
workload.

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-09-22 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514911#comment-15514911
 ] 

Enis Soztutar commented on HBASE-16146:
---

Maybe we can make it so that Counter just delegates to LongAdder if it is 
available.  

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-09-23 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15518004#comment-15518004
 ] 

Gary Helmling commented on HBASE-16146:
---

In master, where we can rely on LongAdder, we've already replaced Counter with 
that.

In branch-1, I'm not sure that using reflection to call through to LongAdder 
when running on Java 8 is going to give us a more performant solution.  And it 
still won't help the situation when running on Java < 8.  Besides, Counter 
generally performs well, it just seems to degrade as more Counters are kept in 
memory due to the ThreadLocal usage.

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-04 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547198#comment-15547198
 ] 

Mikhail Antonov commented on HBASE-16146:
-

>From my perspective (I have not been running YCSB with this patch though) I'm 
>+1 on it for branch-1 and branch-1.3. In some tests/workloads we did see 
>scenarios when excessive thread locals allocations for counters in metrics 
>cause load / latency on the hot machines to go up, impairing stability.

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547267#comment-15547267
 ] 

Enis Soztutar commented on HBASE-16146:
---

bq. And it still won't help the situation when running on Java < 8
I've seen somebody fork the LongAdder / Stripe64 code internally. Not sure 
whether we can do that from a licensing perspective. 
bq. In some tests/workloads we did see scenarios when excessive thread locals 
allocations for counters in metrics cause load / latency on the hot machines to 
go up, impairing stability.
I've also seen these TLs come up in profiling, but did not spend too much time. 

+1 on the patch if we are not forking the code. 

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-11 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1559#comment-1559
 ] 

Gary Helmling commented on HBASE-16146:
---

Here are some microbenchmark results for posterity.

Benchmarking 32 threads updating a single Counter instance:

Counter with patch (removing ThreadLocal):
{noformat}
Result "testCounter":
  N = 121837900
  mean =456.286 ±(99.9%) 3.249 ns/op

  Percentiles, ns/op:
  p(0.) = 45.000 ns/op
 p(50.) =232.000 ns/op
 p(90.) =   1138.000 ns/op
 p(95.) =   1600.000 ns/op
 p(99.) =   2648.000 ns/op
 p(99.9000) =   4456.000 ns/op
 p(99.9900) =  11824.000 ns/op
 p(99.9990) =  45903.487 ns/op
 p(99.) = 1528139.979 ns/op
p(100.) = 31424512.000 ns/op
{noformat}

Counter with ThreadLocal:
{noformat}
Result "testCounterThreadLocal":
  N = 104204449
  mean =412.524 ±(99.9%) 5.910 ns/op

  Percentiles, ns/op:
  p(0.) = 45.000 ns/op
 p(50.) =194.000 ns/op
 p(90.) =976.000 ns/op
 p(95.) =   1404.000 ns/op
 p(99.) =   2532.000 ns/op
 p(99.9000) =   4448.000 ns/op
 p(99.9900) =  11792.000 ns/op
 p(99.9990) =  41655.456 ns/op
 p(99.) = 4312849.000 ns/op
p(100.) = 105906176.000 ns/op
{noformat}

Comparison of implementations:
{noformat}
BenchmarkModeCnt ScoreError 
 Units
IncrementBenchmark.testAtomicLong  sample   81080122  1880.701 ± 14.435 
 ns/op
IncrementBenchmark.testCounter sample  121837900   456.286 ±  3.249 
 ns/op
IncrementBenchmark.testCounterThreadLocal  sample  104204449   412.524 ±  5.910 
 ns/op
IncrementBenchmark.testLongAdder   sample  10871281277.910 ±  1.070 
 ns/op
{noformat}

So, when operating on a single instance, the ThreadLocal version is a bit 
faster.

However, when microbenchmarking FastLongHistogram using the two different 
implementations, in a semi-realistic scenario which retains 500 histograms in 
memory, randomly selecting 10 to update each call, with 200 threads, the cost 
of the ThreadLocal becomes more clear:

FastLongHistogram with Counter with patch:
{noformat}
Result "fastLong":
  N = 1373429925
  mean =  48721.146 ±(99.9%) 196.908 ns/op

  Percentiles, ns/op:
  p(0.) =   2336.000 ns/op
 p(50.) =   6664.000 ns/op
 p(90.) =   7520.000 ns/op
 p(95.) =   7784.000 ns/op
 p(99.) =   8560.000 ns/op
 p(99.9000) =  24288.000 ns/op
 p(99.9900) = 94896128.000 ns/op
 p(99.9990) = 153878528.000 ns/op
 p(99.) = 654311424.000 ns/op
p(100.) = 2092957696.000 ns/op
{noformat}

FastLongHistogram with Counter with ThreadLocal:
{noformat}
Result "fastLongThreadLocal":
  N = 1251201915
  mean =  84227.741 ±(99.9%) 1114.037 ns/op

  Percentiles, ns/op:
  p(0.) =   4056.000 ns/op
 p(50.) =   9760.000 ns/op
 p(90.) =  12336.000 ns/op
 p(95.) =  13648.000 ns/op
 p(99.) =  16544.000 ns/op
 p(99.9000) = 285696.000 ns/op
 p(99.9900) = 111017984.000 ns/op
 p(99.9990) = 172228608.000 ns/op
 p(99.) = 4445962240.000 ns/op
p(100.) = 31742492672.000 ns/op
{noformat}

Result summary:
{noformat}
Benchmark   Mode Cnt  Score 
 Error  Units
MultiHistogramBenchmark.fastLong  sample  1373429925  48721.146 
±  196.908  ns/op
MultiHistogramBenchmark.fastLongThreadLocal   sample  1251201915  84227.741 
± 1114.037  ns/op
MultiHistogramBenchmark.testHDRAtomic sample  1320949956  27066.038 
±  177.677  ns/op
MultiHistogramBenchmark.testHDRConcurrent sample  1330869473  26586.309 
±  170.456  ns/op
MultiHistogramBenchmark.testMutableTimeHistogram  sample  1322279057  53766.021 
±  238.439  ns/op
{noformat}

So with more Counters in memory and more threads, removing the ThreadLocal 
usage results in a ~40% improvement, with up to an order of magnitude 
improvement at upper percentiles.

We may still want to investigate using HDRHistogram, since its implementations 
outperform both versions.  But in the short term this should still be an 
improvement.

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566686#comment-15566686
 ] 

stack commented on HBASE-16146:
---

+1

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: HBASE-16146.branch-1.3.001.patch, counters.patch, 
> less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567218#comment-15567218
 ] 

Hudson commented on HBASE-16146:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #42 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/42/])
HBASE-16146 Remove thread local usage in Counter (garyh: rev 
dcb47c9b715c2331abe7e8ccbc1b69f24168dd97)
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Counter.java


> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Gary Helmling
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16146.001.patch, HBASE-16146.branch-1.001.patch, 
> HBASE-16146.branch-1.3.001.patch, counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567279#comment-15567279
 ] 

Hudson commented on HBASE-16146:


SUCCESS: Integrated in Jenkins build HBase-1.4 #461 (See 
[https://builds.apache.org/job/HBase-1.4/461/])
HBASE-16146 Remove thread local usage in Counter (garyh: rev 
4f29c230384b82b64ef4ad9ba61497747436799f)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Counter.java


> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Gary Helmling
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16146.001.patch, HBASE-16146.branch-1.001.patch, 
> HBASE-16146.branch-1.3.001.patch, counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567444#comment-15567444
 ] 

Hudson commented on HBASE-16146:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #37 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/37/])
HBASE-16146 Remove thread local usage in Counter (garyh: rev 
dcb47c9b715c2331abe7e8ccbc1b69f24168dd97)
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Counter.java


> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Gary Helmling
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16146.001.patch, HBASE-16146.branch-1.001.patch, 
> HBASE-16146.branch-1.3.001.patch, counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16146) Counters are expensive...

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567624#comment-15567624
 ] 

Hudson commented on HBASE-16146:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1770 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1770/])
HBASE-16146 Remove thread local usage in Counter (garyh: rev 
7b0acc292e1854b09c6cedc4dae1f6dae07779bf)
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Counter.java


> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Gary Helmling
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16146.001.patch, HBASE-16146.branch-1.001.patch, 
> HBASE-16146.branch-1.3.001.patch, counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)