[
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166789#comment-16166789
]
stack commented on PHOENIX-3111:
--------------------------------
Funny. I took a look at HBASE-14893. My comment on it (ignored) was "This is
crazy stuff with coprocessor taking out internal region lock. It should have a
test." So, I'd be up for removing it (smile); Phoenix shouldn't be taking hbase
locks. We have a hard enough time keeping out locking story straight in first
place (see [~apurtell] comment too on the regime changing (too) often). I'm
game for discussion.
> Possible Deadlock/delay while building index, upsert select, delete rows at
> server
> ----------------------------------------------------------------------------------
>
> Key: PHOENIX-3111
> URL: https://issues.apache.org/jira/browse/PHOENIX-3111
> Project: Phoenix
> Issue Type: Bug
> Reporter: Sergio Peleato
> Assignee: Rajeshbabu Chintaguntla
> Priority: Critical
> Fix For: 4.8.0
>
> Attachments: PHOENIX-3111_addendum.patch, PHOENIX-3111.patch,
> PHOENIX-3111_v2.patch
>
>
> There is a possible deadlock while building local index or running upsert
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same
> table in that case there is a chance of memstore might reach the threshold of
> blocking memstore size then RegionTooBusyException might be thrown back to
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the
> region. But if the split happen in between then split might be waiting for
> write lock on the region to close and flush wait for readlock because the
> write lock in the queue until the local index build completed. Local index
> build won't complete because we are not allowed to write until there is
> flush. This might not be complete deadlock situation but the queries might
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5
> os_prio=31 tid=0x00007f7fb2050800 nid=0x1c033 waiting on condition
> [0x0000000139b68000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006ede72550> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
> - locked <0x00000006ede69d00> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - <0x00000006ee132098> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303
> waiting on condition [0x00000001388e9000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006ede72550> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> As a fix we need to block region splits if building index, upsert select,
> delete rows running at server.
> Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing
> it. [~speleato] for finding it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)