[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833378#comment-15833378
 ] 

Yu Li commented on HBASE-16698:
-------------------------------

After some review of the mechanism, let me further explain about the 
performance improvement for SYNC_WAL in theory for branch-1:

First, let me echo current process of doMiniBatchMutation for branch-1:
1. acquire locks
2. update timestamps
3. build WAL edit
4. append edit to WAL
5. write to memstore
6. sync WAL or defer

Just before step #5, we have below codes before patch:
{code}
      // ------------------------------------
      // Acquire the latest mvcc number
      // ----------------------------------
      if (!isInReplay) {
        writeEntry = walKey.getWriteEntry();
        mvccNum = writeEntry.getWriteNumber();
      } else {
        mvccNum = batchOp.getReplaySequenceId();
      }

      // ------------------------------------
      // STEP 5. Write back to memstore
      // Write to memstore. It is ok to write to memstore
      // first without syncing the WAL because we do not roll
      // forward the memstore MVCC. The MVCC will be moved up when
      // the complete operation is done. These changes are not yet
      // visible to scanners till we update the MVCC. The MVCC is
      // moved only when the sync is complete.
      // ----------------------------------
{code}

Where in {{walKey#getWriteEntry}} we will wait for the CountDownLatch, which 
will be released by {{RingBufferEventHandler#append}} or say 
{{FSWALEntry#stampRegionSequenceId}}. And since ringbuffer is handled in 
sequence, this makes a contention for puts on all regions on the same 
regionserver.

Notice that under high put overload this will make step #5 waiting, and in some 
case this wait may be longer than the sync IO time, then when it arrives at 
step #6, {{FSHLog#blockOnSync}} will return directly. This mainly answers below 
question [~allan163] asked in previous comment.
bq. But, my question is, even if you solved this problem, the handlers still 
have to waitting for syncOrDefer to complete

Let me prepare some doc and upload here to make it easier for understanding w/o 
reading the long comment list (smile).

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16698
>                 URL: https://issues.apache.org/jira/browse/HBASE-16698
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 1.2.3
>            Reporter: Yu Li
>            Assignee: Yu Li
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: hadoop0495.et2.jstack, HBASE-16698.branch-1.patch, 
> HBASE-16698.branch-1.v2.patch, HBASE-16698.branch-1.v2.patch, 
> HBASE-16698.patch, HBASE-16698.v2.patch
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to