[
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401346#comment-15401346
]
Hadoop QA commented on PHOENIX-3111:
------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12821230/PHOENIX-3111_v2.patch
against master branch at commit 3251ac58a6a9de890285ae82ba86d76618fa0a1c.
ATTACHMENT ID: 12821230
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
34 warning messages.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 lineLengths{color}. The patch introduces the following lines
longer than 100:
+ * 1) In case of split we just throw IOException so split won't happen
but it will not cause any harm.
+ * 3) In case of region close by balancer/move wait before closing the
reason and fail the query which
+ region.batchMutate(mutations.toArray(mutationArray),
HConstants.NO_NONCE, HConstants.NO_NONCE);
+ throw new IOException("Region is getting closed. Not allowing
to write to avoid possible deadlock.");
+ // Don't allow splitting if operations need read and write to same
region are going on in the
+ throw new DoNotRetryIOException("Operations like local index
building/delete/upsert select"
{color:red}-1 core tests{color}. The patch failed these unit tests:
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.MutableIndexToolIT
Test results:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/487//testReport/
Javadoc warnings:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/487//artifact/patchprocess/patchJavadocWarnings.txt
Console output:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/487//console
This message is automatically generated.
> Possible Deadlock/delay while building index, upsert select, delete rows at
> server
> ----------------------------------------------------------------------------------
>
> Key: PHOENIX-3111
> URL: https://issues.apache.org/jira/browse/PHOENIX-3111
> Project: Phoenix
> Issue Type: Bug
> Reporter: Sergio Peleato
> Assignee: Rajeshbabu Chintaguntla
> Priority: Critical
> Fix For: 4.8.0
>
> Attachments: PHOENIX-3111.patch, PHOENIX-3111_v2.patch
>
>
> There is a possible deadlock while building local index or running upsert
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same
> table in that case there is a chance of memstore might reach the threshold of
> blocking memstore size then RegionTooBusyException might be thrown back to
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the
> region. But if the split happen in between then split might be waiting for
> write lock on the region to close and flush wait for readlock because the
> write lock in the queue until the local index build completed. Local index
> build won't complete because we are not allowed to write until there is
> flush. This might not be complete deadlock situation but the queries might
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5
> os_prio=31 tid=0x00007f7fb2050800 nid=0x1c033 waiting on condition
> [0x0000000139b68000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006ede72550> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
> - locked <0x00000006ede69d00> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - <0x00000006ee132098> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303
> waiting on condition [0x00000001388e9000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006ede72550> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> As a fix we need to block region splits if building index, upsert select,
> delete rows running at server.
> Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing
> it. [~speleato] for finding it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)