[ https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598266#comment-15598266 ]
stack commented on HBASE-16698: ------------------------------- bq. On master, when I do a jstack with some load, almost all the handlers are waiting for sync(), and since the memstore insert does not happen until sync() completes, we do not have to wait for the latch. Actually, the latch can be removed as well (for Durability.SYNC_WAL). For async, we still have to have the latch I think. Yeah, the reordering of the write pipeline in Master changes the equation. It is currently at a 'safe' place. Review and discussion could buy us some more improvement here especially it now much easier to reason about what is happening given the reordering. Consider too though that master branch write path will change again if/when asyncWAL becomes default (there is no ringbuffer when asyncwal). I am of the opinion that we need to get a handle on the dfsclient's packet-sending rhythm if we are to make any progress WAL writing. In studies over the last few days, it is beyond our influence and does the same old behavior whatever we do on our side (ringbuffer aggregations and appends for sure help but having to rely on five syncer threads each interrupting packet formation hopefully w/ jigger so not too many null sends is voodoo engineering and says to me that we need to own the client -- e.g. asyncwal -- or expose more means of controlling the flow in dfsclient#dfsoutputstream to us, the client). Thanks for the paper reference [~enis] Looks great. Handlers having to wait on syncs messes us up (or, not being async in our core messes us up -- take your pick). We should be able to make do with one sync'ing thread but when only one syncer, we are aggregating 70 handlers waiting on sync in primitive tests which means 70 handlers that are stuck NOT adding more load on the server; hence 5 syncers each aggregating 10 or 12 syncs works a bit better. What are you thinking regards where the handler goes after it starts the sync? It goes back to the client? (FB had a Delay thing hacked in once that seems similar). How is the 'pickup' done? It sounds great. On rewrite of batchMutate, our HRegion has loads of duplication by method. You see the batchMutate refactor working elsewhere for other methods? Thanks. > Performance issue: handlers stuck waiting for CountDownLatch inside > WALKey#getWriteEntry under high writing workload > -------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-16698 > URL: https://issues.apache.org/jira/browse/HBASE-16698 > Project: HBase > Issue Type: Improvement > Components: Performance > Affects Versions: 1.2.3 > Reporter: Yu Li > Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-16698.branch-1.patch, > HBASE-16698.branch-1.v2.patch, HBASE-16698.branch-1.v2.patch, > HBASE-16698.patch, HBASE-16698.v2.patch, hadoop0495.et2.jstack > > > As titled, on our production environment we observed 98 out of 128 handlers > get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside > {{WALKey#getWriteEntry}} under a high writing workload. > After digging into the problem, we found that the problem is mainly caused by > advancing mvcc in the append logic. Below is some detailed analysis: > Under current branch-1 code logic, all batch puts will call > {{WALKey#getWriteEntry}} after appending edit to WAL, and > {{seqNumAssignedLatch}} is only released when the relative append call is > handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). > Because currently we're using a single event handler for the ringbuffer, the > append calls are handled one by one (actually lot's of our current logic > depending on this sequential dealing logic), and this becomes a bottleneck > under high writing workload. > The worst part is that by default we only use one WAL per RS, so appends on > all regions are dealt with in sequential, which causes contention among > different regions... > To fix this, we could also take use of the "sequential appends" mechanism, > that we could grab the WriteEntry before publishing append onto ringbuffer > and use it as sequence id, only that we need to add a lock to make "grab > WriteEntry" and "append edit" a transaction. This will still cause contention > inside a region but could avoid contention between different regions. This > solution is already verified in our online environment and proved to be > effective. > Notice that for master (2.0) branch since we already change the write > pipeline to sync before writing memstore (HBASE-15158), this issue only > exists for the ASYNC_WAL writes scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332)