[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-19 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366078#comment-17366078
 ] 

Hudson commented on HBASE-25998:


Results for branch branch-2.4
[build #145 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/145/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/145/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/145/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/145/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/145/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.5
>
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-19 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366000#comment-17366000
 ] 

Hudson commented on HBASE-25998:


Results for branch branch-2
[build #280 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/280/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/280/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/280/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/280/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/280/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/280//console].


> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.5
>
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-19 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365977#comment-17365977
 ] 

Hudson commented on HBASE-25998:


Results for branch branch-2.3
[build #240 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/240/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/240/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/240/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/240/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/240/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.5
>
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365692#comment-17365692
 ] 

Hudson commented on HBASE-25998:


Results for branch master
[build #326 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/326/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/326/General_20Nightly_20Build_20Report/]






(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/326/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/326/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-17 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365057#comment-17365057
 ] 

Bharath Vissapragada commented on HBASE-25998:
--

FSHLog doesn't show much improvement in WALPE with the patch, so I believe that 
is reflected in the YCSB runs too. Unfortunately I'm not able to deploy a 
branch-2 cluster right now (without much effort) to get the async WAL numbers. 
I will update here once I have a cluster and some data.

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-17 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365017#comment-17365017
 ] 

Andrew Kyle Purtell commented on HBASE-25998:
-

Unlike with WALPE there's a lot going on in a real cluster test. The WAL is 
critical to performance but only one of many factors. We would expect an 
improvement in WAL latency to be reflected in improved per-mutation operational 
latency. Your YCSB results are in line with that even though it is not as 
impressive as a microbenchmark.  

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-16 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364520#comment-17364520
 ] 

Bharath Vissapragada commented on HBASE-25998:
--

Thanks [~apurtell] for trying out the patch (and review).

One interesting behavior here is that this big throughput difference is only 
obvious for Async WAL implementation, not clear to me why, perhaps there is a 
lot more contention in that implementation for some reason. I repeated the same 
set of tests in branch-1/master based FSHLog and the patch only performs 
slightly better (few single digit % points). This behavior was also confirmed 
in the YCSB runs on branch-1 (on a 3 node containerized EC2 cluster).

Without patch: branch-1/FSHLog (10M ingest only)
{noformat}
[OVERALL], RunTime(ms), 199938
[OVERALL], Throughput(ops/sec), 50015.50480649001
[TOTAL_GCS_PS_Scavenge], Count, 293
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 1222
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.611189468735308
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 34
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.017005271634206603
[TOTAL_GCs], Count, 294
[TOTAL_GC_TIME], Time(ms), 1256
[TOTAL_GC_TIME_%], Time(%), 0.6281947403695145
[CLEANUP], Operations, 512
[CLEANUP], AverageLatency(us), 41.0234375
[CLEANUP], MinLatency(us), 0
[CLEANUP], MaxLatency(us), 18527
[CLEANUP], 95thPercentileLatency(us), 13
[CLEANUP], 99thPercentileLatency(us), 37
[INSERT], Operations, 1000
[INSERT], AverageLatency(us), 5085.9494093
[INSERT], MinLatency(us), 1499
[INSERT], MaxLatency(us), 220927
[INSERT], 95thPercentileLatency(us), 6511
[INSERT], 99thPercentileLatency(us), 16655
[INSERT], Return=OK, 1000
{noformat}
With patch: branch-1/FSHLog (10M ingest only)
{noformat}
[OVERALL], RunTime(ms), 195064
[OVERALL], Throughput(ops/sec), 51265.2257720543
[TOTAL_GCS_PS_Scavenge], Count, 284
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 1184
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.6069802731411229
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 33
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.01691752450477792
[TOTAL_GCs], Count, 285
[TOTAL_GC_TIME], Time(ms), 1217
[TOTAL_GC_TIME_%], Time(%), 0.6238977976459008
[CLEANUP], Operations, 512
[CLEANUP], AverageLatency(us), 45.783203125
[CLEANUP], MinLatency(us), 1
[CLEANUP], MaxLatency(us), 20591
[CLEANUP], 95thPercentileLatency(us), 14
[CLEANUP], 99thPercentileLatency(us), 37
[INSERT], Operations, 1000
[INSERT], AverageLatency(us), 4958.6662675
[INSERT], MinLatency(us), 1380
[INSERT], MaxLatency(us), 295935
[INSERT], 95thPercentileLatency(us), 6335
[INSERT], 99thPercentileLatency(us), 19071
[INSERT], Return=OK, 1000
{noformat}
Unfortunately, the tooling I have does not support branch-2/master (yet) so 
that I can repeat this YCSB run for Async WAL implementation but if WALPE runs 
are any indication, we should be a good enough throughput improvement.

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-15 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363977#comment-17363977
 ] 

Andrew Kyle Purtell commented on HBASE-25998:
-

My results, on a MacBook Pro 2019 2.3 GHz 8-Core Intel Core i9

Java:
{noformat}
openjdk version "11.0.8" 2020-07-14 LTS
OpenJDK Runtime Environment Zulu11.41+23-CA (build 11.0.8+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.41+23-CA (build 11.0.8+10-LTS, mixed mode)
{noformat}

Current master at 555f8b46 (./bin/hbase 
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 256 -iterations 
10, first stats dump)
{noformat}
-- Histograms --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
 count = 2879583
   min = 1875557
   max = 246347480
  mean = 2938092.83
stddev = 10908886.22
median = 2190795.00
  75% <= 2373648.00
  95% <= 2833351.00
  98% <= 4978663.00
  99% <= 6457163.00
99.9% <= 213634065.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
 count = 28275
   min = 52
   max = 103
  mean = 101.79
stddev = 3.18
median = 102.00
  75% <= 102.00
  95% <= 102.00
  98% <= 102.00
  99% <= 103.00
99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
 count = 28276
   min = 118014
   max = 242926458
  mean = 1179929.00
stddev = 7471679.91
median = 867201.00
  75% <= 934459.00
  95% <= 1181470.00
  98% <= 1909398.00
  99% <= 3711500.00
99.9% <= 6662930.00

-- Meters --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
 count = 1604304263
 mean rate = 51688418.00 events/second
 1-minute rate = 43829916.60 events/second
 5-minute rate = 39579618.94 events/second
15-minute rate = 38725509.54 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
 count = 28278
 mean rate = 911.04 events/second
 1-minute rate = 772.23 events/second
 5-minute rate = 697.15 events/second
15-minute rate = 682.06 events/second
{noformat}

With patch (./bin/hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation 
-threads 256 -iterations 10, first stats dump)
{noformat}
-- Histograms --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
 count = 5113265
   min = 879033
   max = 202881049
  mean = 1421741.40
stddev = 6905506.90
median = 1063825.00
  75% <= 1215826.00
  95% <= 1843140.00
  98% <= 3479868.00
  99% <= 4076417.00
99.9% <= 202881049.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
 count = 50232
   min = 52
   max = 106
  mean = 101.84
stddev = 2.92
median = 102.00
  75% <= 102.00
  95% <= 102.00
  98% <= 103.00
  99% <= 103.00
99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
 count = 50233
   min = 98682
   max = 73959735
  mean = 542249.37
stddev = 2083003.22
median = 418651.00
  75% <= 487203.00
  95% <= 742249.00
  98% <= 1040476.00
  99% <= 1693894.00
99.9% <= 3739216.00

-- Meters --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
 count = 2848677354
 mean rate = 91435148.51 events/second
 1-minute rate = 79981952.53 events/second
 5-minute rate = 74153640.38 events/second
15-minute rate = 72986075.52 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
 count = 50237
 mean rate = 1612.44 events/second
 1-minute rate = 1410.28 events/second
 5-minute rate = 1307.50 events/second
15-minute rate = 1286.92 events/second
{noformat}


> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: 

[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-13 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362670#comment-17362670
 ] 

Bharath Vissapragada commented on HBASE-25998:
--

[~stack] Thanks for taking a look, test runs seem fine, trying to do an e-e 
throughput test on a cluster, will report the results here.

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362233#comment-17362233
 ] 

Michael Stack commented on HBASE-25998:
---

The numbers look nice [~bharathv]  (Not in a place to try locally – OOO). Patch 
looks good. Your safety check passes?

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-11 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361973#comment-17361973
 ] 

Bharath Vissapragada commented on HBASE-25998:
--

Redid the experiments with JDK-11 (to account for any latest monitor 
performance enhancements) and I see similar numbers. Also, the numbers above 
are for {{-t 256}} which implies heavy contention. It seems like the patch 
performs well under heavy load and the gap narrows with fewer threads (which I 
guess is expected), but even with very low concurrency the patch seems to out 
perform the current state.

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-11 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361867#comment-17361867
 ] 

Bharath Vissapragada commented on HBASE-25998:
--

[~zhangduo] [~apurtell] [~stack] might of interest to you (draft patch up for 
review), results seem too good to be true. If you don't mind trying the patch 
locally in your environment (just want to eliminate any noise from my end)..  
PTAL.

> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

2021-06-11 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361864#comment-17361864
 ] 

Bharath Vissapragada commented on HBASE-25998:
--

{noformat}
java -version
java version "1.8.0_221"
Java(TM) SE Runtime Environment (build 1.8.0_221-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode)
{noformat}


For default WAL provider (async WAL) 

Without Patch

{noformat}
-- Histograms --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
 count = 10271257
   min = 2672827
   max = 67700701
  mean = 4084532.41
stddev = 6244597.80
median = 3403047.00
  75% <= 3525394.00
  95% <= 3849268.00
  98% <= 4319378.00
  99% <= 61134500.00
99.9% <= 67195663.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
 count = 100888
   min = 52
   max = 103
  mean = 101.91
stddev = 2.09
median = 102.00
  75% <= 102.00
  95% <= 102.00
  98% <= 102.00
  99% <= 103.00
99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
 count = 100889
   min = 119051
   max = 62778058
  mean = 1601305.10
stddev = 3626948.72
median = 1361530.00
  75% <= 1407052.00
  95% <= 1523418.00
  98% <= 1765310.00
  99% <= 2839178.00
99.9% <= 62778058.00

-- Meters --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
 count = 5721241096
 mean rate = 37890589.06 events/second
 1-minute rate = 36390169.75 events/second
 5-minute rate = 33524039.88 events/second
15-minute rate = 31915066.49 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
 count = 100889
 mean rate = 668.16 events/second
 1-minute rate = 641.77 events/second
 5-minute rate = 590.37 events/second
15-minute rate = 561.67 events/second
{noformat}

With patch:

{noformat}
-- Histograms --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
 count = 12927042
   min = 943723
   max = 60827209
  mean = 1865217.32
stddev = 5384907.53
median = 1323691.00
  75% <= 1443195.00
  95% <= 1765866.00
  98% <= 1921920.00
  99% <= 3144643.00
99.9% <= 60827209.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
 count = 126797
   min = 52
   max = 104
  mean = 101.87
stddev = 2.54
median = 102.00
  75% <= 102.00
  95% <= 102.00
  98% <= 103.00
  99% <= 103.00
99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
 count = 126798
   min = 122666
   max = 60703608
  mean = 711847.31
stddev = 3174375.63
median = 519092.00
  75% <= 570240.00
  95% <= 695175.00
  98% <= 754972.00
  99% <= 791139.00
99.9% <= 59975393.00

-- Meters --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
 count = 7200681555
 mean rate = 79170095.16 events/second
 1-minute rate = 75109969.27 events/second
 5-minute rate = 66505621.40 events/second
15-minute rate = 63719949.74 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
 count = 126800
 mean rate = 1394.11 events/second
 1-minute rate = 1322.31 events/second
 5-minute rate = 1169.99 events/second
15-minute rate = 1120.69 events/second
{noformat}



> Revisit synchronization in SyncFuture
> -
>
> Key: HBASE-25998
> URL: https://issues.apache.org/jira/browse/HBASE-25998
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: monitor-overhead-1.png,