Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

鈴木俊裕 Sat, 12 Sep 2015 11:26:41 -0700

St.Ack,

Thank you for your response.


Why I make out that "A region lock (not a row lock) seems to occur in
waitForPreviousTransactionsComplete()" is as follows:

A increment operation has 3 procedures for MVCC.

1. mvcc.waitForPreviousTransactionsComplete();

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712

2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721

3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893


I think that MultiVersionConsistencyControl's writeQueue can cause a region
lock.

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43


Step 2 adds to a WriteEntry to writeQueue.

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108

Step 3 removes the WriteEntry from writeQueue.

completeMemstoreInsertWithSeqNum(w, walKey) ->
waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160

Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and waits
until writeQueue is empty or writeQueue.getFirst() == w.

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241


I think when a handler thread is processing between step 2 and step 3, the
other handler threads can wait at step 1 until the thread completes step 3
This is depicted as follows:

https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png


Actually, in the thread dump of our region server, many handler threads
(RW.default.writeRpcServer.handler) wait at Step 1
(waitForPreviousTransactionsComplete()).

https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt

Many handler threads wait at this:

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224


> Is it possible you are contending on a counter post-upgrade?  Is it
> possible that all these threads are trying to get to the same row to
update
> it? Could the app behavior have changed?  Or are you thinking increment
> itself has slowed significantly?
We have just upgraded HBase, not changed the app behavior. We are thinking
increment itself has slowed significantly.
Before upgrading HBase, it was good throughput and latency.
Currently, to cope with this problem, we split the regions finely.

Thanks,

Toshihiro Suzuki


2015-09-09 15:29 GMT+09:00 Stack <[email protected]>:

> On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <[email protected]> wrote:
>
> > Ted,
> >
> > Thank you for your response.
> >
> > I uploaded the complete stack trace to Gist.
> >
> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> >
> >
> > I think that increment operation works as follows:
> >
> > 1. get row lock
> > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior MVCC
> > transactions to finish
> > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > 4. get previous values
> > 5. create KVs
> > 6. write to Memstore
> > 7. write to WAL
> > 8. release row lock
> > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the transaction
> >
> > A instance of MultiVersionConsistencyControl has a pending queue of
> writes
> > named writeQueue.
> > Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue is
> > empty or writeQueue.getFirst() == w.
> > Step 3 puts a WriteEntry to writeQueue and step 9 removes the WriteEntry
> > from writeQueue.
> >
> > I think that when a handler thread is processing between step 2 and step
> 9,
> > the other handler threads can wait until the thread completes step 9.
> >
> >
> That is right. We need to read, after all outstanding updates are done...
> because we need to read the latest update before we go to modify/increment
> it.
>
> How do you make out this?
>
> "A region lock (not a row lock) seems to occur in
> waitForPreviousTransactionsComplete()."
>
> In 0.98.x we did this:
>
> mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>
> ... and in 1.0 we do this:
>
> mvcc.waitForPreviousTransactionsComplete() which is this....
>
> +  public void waitForPreviousTransactionsComplete() {
> +    WriteEntry w = beginMemstoreInsert();
> +    waitForPreviousTransactionsComplete(w);
> +  }
>
> The mvcc and region sequenceid were merged in 1.0 (
> https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and
> region
> sequenceid would spin independent of each other. Perhaps this responsible
> for some slow down.
>
> That said, looking in your thread dump, we seem to be down in the Get. If
> you do a bunch of thread dumps in a row, where is the lock-holding thread?
> In Get or writing Increment... or waiting on sequence id?
>
> Is it possible you are contending on a counter post-upgrade?  Is it
> possible that all these threads are trying to get to the same row to update
> it? Could the app behavior have changed?  Or are you thinking increment
> itself has slowed significantly?
>
> St.Ack
>
>
>
>
> > Thanks,
> >
> > Toshihiro Suzuki
> >
> >
> > 2015-09-09 0:05 GMT+09:00 Ted Yu <[email protected]>:
> >
> > > In HRegion#increment(), we lock the row (not region):
> > >
> > >     try {
> > >       rowLock = getRowLock(row);
> > >
> > > Can you pastebin the complete stack trace ?
> > >
> > > Thanks
> > >
> > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > CDH5.4.5(HBase1.0.0)
> > > > and we experience slowdown in increment operation.
> > > >
> > > > Here's an extract from thread dump of the RegionServer of our
> cluster:
> > > >
> > > > Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > >   State: BLOCKED
> > > >   Blocked count: 21689888
> > > >   Waited count: 39828360
> > > >   Blocked on java.util.LinkedList@3474e4b2
> > > >   Blocked by 63
> > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > >   Stack:
> > > >     java.lang.Object.wait(Native Method)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > >     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > >     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >
>  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >     java.lang.Thread.run(Thread.java:745)
> > > >
> > > > There are many similar threads in the thread dump.
> > > >
> > > > I read the source code and I think this is caused by changes of
> > > > MultiVersionConsistencyControl.
> > > > A region lock (not a row lock) seems to occur in
> > > > waitForPreviousTransactionsComplete().
> > > >
> > > >
> > > > Also we wrote performance test code for increment operation that
> > included
> > > > 100 threads and ran it in local mode.
> > > >
> > > > The result is shown below:
> > > >
> > > > CDH5.3.1(HBase0.98.6)
> > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > >
> > > > CDH5.4.5(HBase1.0.0)
> > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Toshihiro Suzuki
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Reply via email to