[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: 14460.v0.branch-1.0.patch Patch for branch-1.0. This is what I'm going to fix first. This patch tries to minimize change. Master branch will be very different with a radical redo (it is warranted given the code duplication and duplication of record keeping; i.e. we keep all Cells incremented twice... once as standalone list and then again inside in FSWALEntry. Here is the commit log message: {code} Patch for branch-1.0 first. Will address later branches with a different approach (a more radical fixup). Here we are trying to be safe making minimal change. This patch adds a fast increment. To enable it you set the below configuration to true in your hbase-site.xml configuration: hbase.increment.fast.but.narrow.consistency This sets region to take the fast increment path. Constraint is that caller can only access the Cell via Increment; intermixing Increment with other Mutations will give indeterminate results. Get will work or an Increment of zero will return current value. So, to add the above, we effectively copy/paste current Increment after doing a bunch of work to try and move common code out into methods that can be shared. Current increment becomes a switch and dependent on config we take the slow but consistent or the fast but narrowly consistent code path. Increment code path has too much state that it needs to keep up so hard to shrink it down more than what we have here without radical refactor (TODO in master patch; the refactor is needed because even cursory exploration has us DUPLICATING lists of Cells ... some of which is addressed on fast path here but more to do; fast path also simplifies the write to hbase so am able to drop some of the state keeping). Adds a carryForward set of methods for Tags handling which allows us clean up some duplicated code. So, difference between fastAndNarrowConsistencyIncrement and slowButConsistentIncrement is that the former holds the row lock until the sync completes; this allows us to reason that there are no other writers afoot when we read the current increment value. This means we do not wait on mvcc reads to catch up to writes before we proceed with the read, the root of the slowdown seen in HBASE-14460. The fast-path also does not wait on mvcc to complete before returning to the client and we reorder the write so that the update of memstore happens AFTER sync returns; i.e. the write pipeline is less zigzagging now. Added some simple concurrency testing and then a performance testing tool for Increments. Added test that Increment of zero amount returns the current Increment value. {code} > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 0.94.test.patch, 0.98.test.patch, > 1.0.80.flamegraph-7932.svg, 14460.txt, 14460.v0.branch-1.0.patch, > 98.80.flamegraph-11428.svg, HBASE-14460-discussion.patch, client.test.patch, > flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, > flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, > flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, > hack.flamegraph-16593.svg, hack.uncommitted.patch, m.test.patch, > region_lock.png, testincrement.094.patch, testincrement.098.patch, > testincrement.master.patch > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: hack.uncommitted.patch hack.flamegraph-16593.svg Patch for discussion. Gets us back to 0.98 speeds; i.e. about 1/3rd slower than 0.94. Idea is to unshackle Increments and MVCC other than to keep MVCC abreast of sequenceId change. I can do this if I reorder Increment so the Increment get and write (as well as sync) are all under the row lock; this makes it so my read will get the latest always since no concurrent writer on this row (because I have undone the mvcc connection, I need to read with isolation UNCOMMITTED). If I reorder Increment so its read, append, sync, then update memstore, I can undo the crazy +1B and the need of the post-modification of Cells in MemStore. The gambit is a slower Increment because all happens under the row lock including the sync of the write which used to be done on the outside. This makes it so we don't need MVCC for correctness and so can by-pass the MVCC-is-a-region-wide-lock phenomenon. See attached flamegraph. It looks like 0.98 now. Some basic tests using the above attached IncrementTest (80 concurrent threads doing an increment over 50k rows) show us doing: {code} 75th: 3.92218 95th: 5.64862779997 98th: 8.07254229984 99th: 23.11843173 {code} The same test against 0.98 as quoted above shows: {code} 75th: 4.400081 95th: 6.0390387 98th: 6.7202052 99th: 7.26432036001 Time: 191.393 {code} Posting the patch for discussion. Need to figure downsides. Will study the patch more. Our Increment in memstore should work as expected when Scanning since we are using the actual assigned sequenceid. On crash, edit could be in WAL and client may not know it made it but this has always been an issue. > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 0.94.test.patch, 0.98.test.patch, > 1.0.80.flamegraph-7932.svg, 14460.txt, 98.80.flamegraph-11428.svg, > HBASE-14460-discussion.patch, client.test.patch, > flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, > flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, > flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, > hack.flamegraph-16593.svg, hack.uncommitted.patch, m.test.patch, > region_lock.png, testincrement.094.patch, testincrement.098.patch, > testincrement.master.patch > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: client.test.patch 98.80.flamegraph-11428.svg 1.0.80.flamegraph-7932.svg Attempts at reproducing the slowdown in the small have failed to pay off. I see the roughly 2x difference but not the 7x claimed by the original poster. With some help from Preston Koprivica, you need to have some friction in place to see the issue; the friction gets amplified by mvcc wait. Here is a test that clearly shows the problem. 0.98 is about 33% slower than 0.94 (0.98 added in mvcc) and then 1.0+ is about 10x the latency WHEN you have 80 clients running external to the regionserver process banging on it. The flame graphs show us spending loads of time in mvcc waiting. The stack trace is the SAME as for the tests in the small but we just seem to be waiting overall longer. There is an amplification going on. Looking at options: [~jingcheng...@intel.com]'s suggestion is a nice one. Will narrow what we have to wait on. I tried disabling completely our wait-on-mvcc before we read at all and this helps; we are only 3x slower than 0.98 (and 4x slower than 0.94). Need some other bit of trickery to take us closer to what was there before. > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 0.94.test.patch, 0.98.test.patch, > 1.0.80.flamegraph-7932.svg, 14460.txt, 98.80.flamegraph-11428.svg, > HBASE-14460-discussion.patch, client.test.patch, > flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, > flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, > flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, > m.test.patch, region_lock.png, testincrement.094.patch, > testincrement.098.patch, testincrement.master.patch > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-14460: - Attachment: HBASE-14460-discussion.patch I am thinking about an alternative way to improve the implementation in increment, checkAndPut, etc. In each operation, we can attach a write number per row, in the operation of increment, we can wait for the previous operations to finish only in this row in mvcc.await()? I had drafted an ugly patch (only for master) to do this for discussion. And I ran the TestIncrement, the results are listed in the following. {noformat} 1. testContendedSingleCellIncrementer: With the patch: 1st run is 228.185s. 2nd run is 232.453s. 3th run is 235.457s. 4th run is 229.003s. Without the patch: 1st run is 230.299s. 2nd run is 234.997s. 3rd run is 219.224s. 4th run is 225.731s.. 2. testUnContendedSingleCellIncrementer: With the patch: 59.244s. Without the patch: 81.667s. {noformat} The patch is attached in this JIRA for discussion. Thanks! > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 0.94.test.patch, 0.98.test.patch, 14460.txt, > HBASE-14460-discussion.patch, flamegraph-13120.svg.master.singlecell.svg, > flamegraph-26636.094.100.svg, flamegraph-28066.098.singlecell.svg, > flamegraph-28767.098.100.svg, flamegraph-31647.master.100.svg, > flamegraph-9466.094.singlecell.svg, m.test.patch, region_lock.png, > testincrement.094.patch, testincrement.098.patch, testincrement.master.patch > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: 0.98.test.patch m.test.patch 0.94.test.patch flamegraph-26636.094.100.svg flamegraph-28767.098.100.svg flamegraph-31647.master.100.svg If I run a test that has 100 threads each updating their own rows -- i.e. no contention on a row -- then I see master branch completing before 0.94 does; i.e. master is faster. This is in spite of the thread dump resembling that reported as problematic up top of this issue. In 0.94, all are stuck waiting on the WAL syncer to come in: {code} "50" #74 daemon prio=5 os_prio=0 tid=0x7f7a78661000 nid=0x3364 waiting for monitor entry [0x7f7a30ecd000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1334) - waiting to lock <0x0004cde22390> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1476) at org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:6160) at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:5571) at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:5454) at org.apache.hadoop.hbase.regionserver.TestIncrement$SingleCellIncrementer.run(TestIncrement.java:84) {code} In master they are stuck here: {code} "17" #55 daemon prio=5 os_prio=0 tid=0x7f0374c6d000 nid=0x3a0b in Object.wait() [0x7f030c346000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:218) - locked <0x0004d2e26208> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:149) at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.await(MultiVersionConcurrencyControl.java:137) at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7360) at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7315) at org.apache.hadoop.hbase.regionserver.TestIncrement$SingleCellIncrementer.run(TestIncrement.java:86) {code The flame graphs show basically the same profile across all verisons (master spends a bit less time appending which I suppose is expected). > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 0.94.test.patch, 0.98.test.patch, 14460.txt, > flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, > flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, > flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, > m.test.patch, region_lock.png, testincrement.094.patch, > testincrement.098.patch, testincrement.master.patch > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: testincrement.094.patch testincrement.098.patch testincrement.master.patch flamegraph-9466.094.singlecell.svg flamegraph-13120.svg.master.singlecell.svg flamegraph-28066.098.singlecell.svg There are two ways in which master is slower than 0.94 increments. There is the case where threads are contending to update a single Cell and then there is the case described at the head of this issue where the mvcc coordination is acting like a region-wide lock though all threads incrementing may not be contending on a Cell. Here are some rough measurements of the first case. See attached test. It has 100 threads doing 10k increments of a single Cell up against a Region Instance. {code} 0.94 ~84 seconds 0.98 ~140 seconds master ~180 seconds {code} 0.98 is almost 2x slower than 0.94 (though the code path profile is pretty close if you look at the accompanying flame graphs) and master is slower again, more than 2x slower. As is, reports from the field have it that even 0.98 increments are too slow as is (being 2x slower, if many, can back up all handlers so no other work can get in). Hence the above exercise. It seem that indeed even without mvcc unification, increments have gotten slower. Let me go measure the case where mvcc is getting in the way next. > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 14460.txt, flamegraph-13120.svg.master.singlecell.svg, > flamegraph-28066.098.singlecell.svg, flamegraph-9466.094.singlecell.svg, > region_lock.png, testincrement.094.patch, testincrement.098.patch, > testincrement.master.patch > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: region_lock.png Here is link to the mailing list where 鈴木俊裕 describes the issue: http://mail-archives.apache.org/mod_mbox/hbase-dev/201509.mbox/%3ccangerjyo+k+cpskvoqxf7qvk9wzvsnm9jwdnd4q8d11y3mf...@mail.gmail.com%3E I've attached here the nice diagram he made to illustrate the problem. > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 14460.txt, region_lock.png > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations
[ https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14460: -- Attachment: 14460.txt Silly test to demonstrate the problem. > [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed > Increments, CheckAndPuts, batch operations > --- > > Key: HBASE-14460 > URL: https://issues.apache.org/jira/browse/HBASE-14460 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 14460.txt > > > As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation > between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of > sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to > 'catch up' to our current point before we can read the last Increment value > that we need to update. > We can say that our Increment is just done wrong, we should just be writing > Increments and summing on read, but checkAndPut as well as batching > operations have the same issue. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)