[ 
https://issues.apache.org/jira/browse/PHOENIX-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008663#comment-16008663
 ] 

Geoffrey Jacoby commented on PHOENIX-3838:
------------------------------------------

[~mujtabachohan] [~rajeshbabu]
After some testing I think I know what's going on here. It's a deadlock on the 
HRegion's ReentrantReadWriteLock "lock", which is eventually broken by a 
timeout, but this will often cause the region server to crash. On restart the 
split is undone and the server recovers. 

When an HBase client RPC call is worked server-side, the first thing the write 
pipeline does is grab a *read* lock on the region, with a 60s timeout. It's 
released in a finally block later, so even if the op fails we release the read 
lock. (It's a read lock because it's not changing the state of the HRegion, 
only the data _in_ the region.) 

When a split occurs, the SplitTransaction requests the parent HRegion to close, 
which grabs the *write* lock on the region. This forces it to wait until all 
currently working RPC calls are complete.

In typical cases this works fine. But when you have a local index, the 
coprocessor invokes the indexer inside the write pipeline, which spawns other 
threads that try to run new index batch mutations. The original op waits on 
these child ops, each of which needs to grab a read lock.

(Tangent on ReentrantReadWriteLocks: they have two modes, Fair and NonFair. 
HBase uses the default NonFair mode, which claims to be "every thread for 
itself", and risks write-lock starvation, but in actuality, there's one 
exception: a call to acquire a read lock will wait even if no write lock holds 
the lock, if a write-lock call is first in line to get the lock.)

So, to sum up: Split's write lock call is waiting on an op to relinquish a read 
lock, the op is waiting on the indexer to finish, the indexer is waiting to 
acquire a read lock, which is waiting on Split's write lock. Eventually, the 
indexer ops timeout completely and the deadlock's broken, but that also 
triggers the indexer's emergency "kill the region server" behavior, so the 
cluster crashes. 

Tested using HBase branch-1.3 and Phoenix master, and JDK 1.8u121

  

> Region splits can hang on SPLITTING_NEW when local index is present
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-3838
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3838
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.10.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Blocker
>
> From [~mujtabachohan]:
> Meanwhile with HBase 1.3.1 if I try to split a table while data load is in 
> progress, the table state remains in SPLITTING_NEW and index writer blocked. 
> Table splits fine if there is no active writes happening to the table when 
> split is requested.
> {code}
> Thread 163 (RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=48109):
>   State: WAITING
>   Blocked count: 100
>   Waited count: 463
>   Waiting on com.google.common.util.concurrent.AbstractFuture$Sync@16703eda
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>     
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
>     
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
>     
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:66)
>     
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
>     
> org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitter.write(ParallelWriterIndexCommitter.java:197)
>     
> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:185)
>     
> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:146)
>     
> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:135)
>     
> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:474)
>     org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:407)
>     org.apache.phoenix.hbase.index.Indexer.postPut(Indexer.java:375)
>     
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$32.call(RegionCoprocessorHost.java:956)
>     
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
>     
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
>     
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705)
> {code}
> Following schema was used with batch size of 1000 inserting data in 
> background:
> {code}
> CREATE TABLE IF NOT EXISTS T (PKA CHAR(15) NOT NULL, PKF CHAR(3) NOT NULL,
>  PKP CHAR(15) NOT NULL, CRD DATE NOT NULL, EHI CHAR(15) NOT NULL, FID 
> CHAR(15), CREATED_BY_ID VARCHAR,
>  FH VARCHAR, DT VARCHAR, OS VARCHAR, NS VARCHAR, OFN VARCHAR CONSTRAINT PK 
> PRIMARY KEY ( PKA, PKF, PKP, CRD DESC, EHI )) 
>  VERSIONS=1,MULTI_TENANT=true,IMMUTABLE_ROWS=true;
> CREATE LOCAL INDEX IF NOT EXISTS TIDX ON T (PKF, CRD, PKP, EHI) 
> INCLUDE (FID, CREATED_BY_ID, FH, DT, OS, NS, OFN);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to