[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647590#comment-15647590 ]
Duo Zhang commented on HBASE-16890: ----------------------------------- AsyncFSWAL syncs more even after I introduced a 'maxConcurrentSyncRequest' limit. I set the limit to 5. https://github.com/Apache9/hbase/commit/4366600107646abeb4336191f0385da2c26423e4 The WALPE command is {noformat} ./bin/hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 100 -iterations 250000 -qualifiers 25 -keySize 50 -valueSize 200 &>walpelog & {noformat} The result of AsyncFSWAL {noformat} Summary: threads=100, iterations=250000, syncInterval=0 took 843.537s 29637.111ops/s -- Histograms ------------------------------------------------------------------ org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos count = 24939130 min = 557972 max = 192175884 mean = 3986548.37 stddev = 13706319.65 median = 2440842.00 75% <= 3157719.00 95% <= 9339416.00 98% <= 11016092.00 99% <= 12722808.00 99.9% <= 192175884.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync count = 1688089 min = 0 max = 73 mean = 13.39 stddev = 11.97 median = 10.00 75% <= 11.00 95% <= 49.00 98% <= 53.00 99% <= 56.00 99.9% <= 68.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs count = 1688090 min = 278482 max = 14913837 mean = 1557884.88 stddev = 1613655.10 median = 1195771.00 75% <= 1598231.00 95% <= 2080325.00 98% <= 8236173.00 99% <= 10110921.00 99.9% <= 11701740.00 -- Meters ---------------------------------------------------------------------- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes count = 174324882180 mean rate = 206827108.20 events/second 1-minute rate = 207573011.08 events/second 5-minute rate = 199333528.15 events/second 15-minute rate = 153738236.11 events/second org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs count = 1688090 mean rate = 2002.81 events/second 1-minute rate = 2019.97 events/second 5-minute rate = 1927.71 events/second 15-minute rate = 1489.09 events/second {noformat} The %CPU of WALPE is about 400% and DN is about 60% The result of FSHLog {noformat} Summary: threads=100, iterations=250000, syncInterval=0 took 931.874s 26827.660ops/s -- Histograms ------------------------------------------------------------------ org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos count = 24954191 min = 800101 max = 95039071 mean = 3747676.48 stddev = 6153524.13 median = 3299664.00 75% <= 3374795.00 95% <= 3506255.00 98% <= 3560668.00 99% <= 3598597.00 99.9% <= 95039071.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync count = 509596 min = 0 max = 100 mean = 49.58 stddev = 19.05 median = 50.00 75% <= 66.00 95% <= 76.00 98% <= 78.00 99% <= 79.00 99.9% <= 100.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs count = 509596 min = 88343 max = 216813006 mean = 585987.91 stddev = 5510685.58 median = 371500.00 75% <= 431256.00 95% <= 549931.00 98% <= 628331.00 99% <= 809535.00 99.9% <= 84234716.00 -- Meters ---------------------------------------------------------------------- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes count = 174430494090 mean rate = 187070604.35 events/second 1-minute rate = 186593392.32 events/second 5-minute rate = 182509762.22 events/second 15-minute rate = 152883048.81 events/second org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs count = 509596 mean rate = 546.52 events/second 1-minute rate = 546.05 events/second 5-minute rate = 533.45 events/second 15-minute rate = 445.05 events/second {noformat} The %CPU of WALPE is about 450% and DN is about 45%. The sync request count is 1688090 : 509596 = 3.3 : 1. Need to find out way. > Analyze the performance of AsyncWAL and fix the same > ---------------------------------------------------- > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal > Affects Versions: 2.0.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)