Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

鈴木俊裕 Tue, 08 Sep 2015 23:04:16 -0700

Ted,

Thank you for your response.


I uploaded the complete stack trace to Gist.

https://gist.github.com/brfrn169/cb4f2c157129330cd932


I think that increment operation works as follows:

1. get row lock
2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior MVCC
transactions to finish
3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
4. get previous values
5. create KVs
6. write to Memstore
7. write to WAL
8. release row lock
9. mvcc.completeMemstoreInsertWithSeqNum() // complete the transaction

A instance of MultiVersionConsistencyControl has a pending queue of writes
named writeQueue.
Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue is
empty or writeQueue.getFirst() == w.
Step 3 puts a WriteEntry to writeQueue and step 9 removes the WriteEntry
from writeQueue.

I think that when a handler thread is processing between step 2 and step 9,
the other handler threads can wait until the thread completes step 9.

Thanks,

Toshihiro Suzuki


2015-09-09 0:05 GMT+09:00 Ted Yu <yuzhih...@gmail.com>:

> In HRegion#increment(), we lock the row (not region):
>
>     try {
>       rowLock = getRowLock(row);
>
> Can you pastebin the complete stack trace ?
>
> Thanks
>
> On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <brfrn...@gmail.com> wrote:
>
> > Hi,
> >
> > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> CDH5.4.5(HBase1.0.0)
> > and we experience slowdown in increment operation.
> >
> > Here's an extract from thread dump of the RegionServer of our cluster:
> >
> > Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> >   State: BLOCKED
> >   Blocked count: 21689888
> >   Waited count: 39828360
> >   Blocked on java.util.LinkedList@3474e4b2
> >   Blocked by 63 (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> >   Stack:
> >     java.lang.Object.wait(Native Method)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> >
> >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> >     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> >     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> >     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >     java.lang.Thread.run(Thread.java:745)
> >
> > There are many similar threads in the thread dump.
> >
> > I read the source code and I think this is caused by changes of
> > MultiVersionConsistencyControl.
> > A region lock (not a row lock) seems to occur in
> > waitForPreviousTransactionsComplete().
> >
> >
> > Also we wrote performance test code for increment operation that included
> > 100 threads and ran it in local mode.
> >
> > The result is shown below:
> >
> > CDH5.3.1(HBase0.98.6)
> > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> >
> > CDH5.4.5(HBase1.0.0)
> > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> >
> >
> > Thanks,
> >
> > Toshihiro Suzuki
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Reply via email to