[ https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361864#comment-17361864 ]
Bharath Vissapragada commented on HBASE-25998: ---------------------------------------------- {noformat} java -version java version "1.8.0_221" Java(TM) SE Runtime Environment (build 1.8.0_221-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode) {noformat} For default WAL provider (async WAL) Without Patch {noformat} -- Histograms ------------------------------------------------------------------ org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos count = 10271257 min = 2672827 max = 67700701 mean = 4084532.41 stddev = 6244597.80 median = 3403047.00 75% <= 3525394.00 95% <= 3849268.00 98% <= 4319378.00 99% <= 61134500.00 99.9% <= 67195663.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync count = 100888 min = 52 max = 103 mean = 101.91 stddev = 2.09 median = 102.00 75% <= 102.00 95% <= 102.00 98% <= 102.00 99% <= 103.00 99.9% <= 103.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs count = 100889 min = 119051 max = 62778058 mean = 1601305.10 stddev = 3626948.72 median = 1361530.00 75% <= 1407052.00 95% <= 1523418.00 98% <= 1765310.00 99% <= 2839178.00 99.9% <= 62778058.00 -- Meters ---------------------------------------------------------------------- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes count = 5721241096 mean rate = 37890589.06 events/second 1-minute rate = 36390169.75 events/second 5-minute rate = 33524039.88 events/second 15-minute rate = 31915066.49 events/second org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs count = 100889 mean rate = 668.16 events/second 1-minute rate = 641.77 events/second 5-minute rate = 590.37 events/second 15-minute rate = 561.67 events/second {noformat} With patch: {noformat} -- Histograms ------------------------------------------------------------------ org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos count = 12927042 min = 943723 max = 60827209 mean = 1865217.32 stddev = 5384907.53 median = 1323691.00 75% <= 1443195.00 95% <= 1765866.00 98% <= 1921920.00 99% <= 3144643.00 99.9% <= 60827209.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync count = 126797 min = 52 max = 104 mean = 101.87 stddev = 2.54 median = 102.00 75% <= 102.00 95% <= 102.00 98% <= 103.00 99% <= 103.00 99.9% <= 103.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs count = 126798 min = 122666 max = 60703608 mean = 711847.31 stddev = 3174375.63 median = 519092.00 75% <= 570240.00 95% <= 695175.00 98% <= 754972.00 99% <= 791139.00 99.9% <= 59975393.00 -- Meters ---------------------------------------------------------------------- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes count = 7200681555 mean rate = 79170095.16 events/second 1-minute rate = 75109969.27 events/second 5-minute rate = 66505621.40 events/second 15-minute rate = 63719949.74 events/second org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs count = 126800 mean rate = 1394.11 events/second 1-minute rate = 1322.31 events/second 5-minute rate = 1169.99 events/second 15-minute rate = 1120.69 events/second {noformat} > Revisit synchronization in SyncFuture > ------------------------------------- > > Key: HBASE-25998 > URL: https://issues.apache.org/jira/browse/HBASE-25998 > Project: HBase > Issue Type: Improvement > Components: Performance, regionserver, wal > Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0 > Reporter: Bharath Vissapragada > Assignee: Bharath Vissapragada > Priority: Major > Attachments: monitor-overhead-1.png, monitor-overhead-2.png > > > While working on HBASE-25984, I noticed some weird frames in the flame graphs > around monitor entry exit consuming a lot of CPU cycles (see attached > images). Noticed that the synchronization there is too coarse grained and > sometimes unnecessary. I did a simple patch that switched to a reentrant lock > based synchronization with condition variable rather than a busy wait and > that showed 70-80% increased throughput in WAL PE. Seems too good to be > true.. (more details in the comments). -- This message was sent by Atlassian Jira (v8.3.4#803005)