[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598266#comment-15598266
 ] 

stack commented on HBASE-16698:
-------------------------------

bq. On master, when I do a jstack with some load, almost all the handlers are 
waiting for sync(), and since the memstore insert does not happen until sync() 
completes, we do not have to wait for the latch. Actually, the latch can be 
removed as well (for Durability.SYNC_WAL). For async, we still have to have the 
latch I think.

Yeah, the reordering of the write pipeline in Master changes the equation. It 
is currently at a 'safe' place. Review and discussion could buy us some more 
improvement here especially it now much easier to reason about what is 
happening given the reordering. Consider too though that master branch write 
path will change again if/when asyncWAL becomes default (there is no ringbuffer 
when asyncwal). I am of the opinion that we need to get a handle on the 
dfsclient's packet-sending rhythm if we are to make any progress WAL writing. 
In studies over the last few days, it is beyond our influence and does the same 
old behavior whatever we do on our side (ringbuffer aggregations and appends 
for sure help but having to rely on five syncer threads each interrupting 
packet formation hopefully w/ jigger so not too many null sends is voodoo 
engineering and says to me that we need to own the client -- e.g. asyncwal -- 
or expose more means of controlling the flow in dfsclient#dfsoutputstream to 
us, the client).

Thanks for the paper reference [~enis] Looks great. Handlers having to wait on 
syncs messes us up (or, not being async in our core messes us up -- take your 
pick). We should be able to make do with one sync'ing thread but when only one 
syncer, we are aggregating 70 handlers waiting on sync in primitive tests which 
means 70 handlers that are stuck NOT adding more load on the server; hence 5 
syncers each aggregating 10 or 12 syncs works a bit better. What are you 
thinking regards where the handler goes after it starts the sync? It goes back 
to the client? (FB had a Delay thing hacked in once that seems similar). How is 
the 'pickup' done? It sounds great.

On rewrite of batchMutate, our HRegion has loads of duplication by method. You 
see the batchMutate refactor working elsewhere for other methods?

Thanks.

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16698
>                 URL: https://issues.apache.org/jira/browse/HBASE-16698
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 1.2.3
>            Reporter: Yu Li
>            Assignee: Yu Li
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16698.branch-1.patch, 
> HBASE-16698.branch-1.v2.patch, HBASE-16698.branch-1.v2.patch, 
> HBASE-16698.patch, HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to