[ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418094#comment-15418094
 ] 

James Taylor commented on PHOENIX-3111:
---------------------------------------

Circling back to this now that 4.8.0 is out. We can certainly reconsider the 
approach for 4.8.1 if there are better alternatives, [~lhofhansl]. Do you 
recommend some specific code change that'll help prevent something from going 
wrong? It does feel dangerous, but the alternative (not fixing it) is that 
deadlocks occur. Would be good to know how common this can be, but I suppose 
that's hard to quantify, [~sergey.soldatov]? My guess is that it's more likely 
to happen with the new local index implementation because we're writing to the 
same table 2x.

Both [~enis] and [~rajeshbabu] are HBase committers, so I trust that they're 
knowledgable about the HBase code paths, [~apurtell].

I agree that the writing we're doing in a region observer method called during 
a scan is hacky. It's us being lazy by not creating a new end point 
coprocessor, associated protobuf, and writing the upgrade code to attach them 
to all existing Phoenix tables. I've filed PHOENIX-3177 for this. However, 
unless I'm missing something, I don't think it'd matter if we came in on an 
endpoint coprocessor or a region observer write method. The same problem 
remains based on the sequence and type of locks that are taken.

The optimization we're trying to perform is to issue local writes to HBase from 
a coprocessor so that we don't have to ship the data back to the client and 
then reship it back to HBase to do the write. Is that an invalid thing to do?

Would be great to explore some alternatives:
- Should/can Phoenix take out a write lock in these write scenarios? But that 
would prevent flushes from happening, right?
- With stats enabled, the client would issue scans that are 
{{phoenix.stats.guidepost.width}} apart (300MB by default). This would 
essentially put an upper bound on the amount of data each region observer is 
handling. Did you test with and without stats, [~sergey.soldatov] and if so 
does this have an impact?
- Are there changes in HBase that would make sense to better support this? Is 
there some changes around the locking implementation that can be changed?




> Possible Deadlock/delay while building index, upsert select, delete rows at 
> server
> ----------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3111
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3111
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sergio Peleato
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-3111.patch, PHOENIX-3111_addendum.patch, 
> PHOENIX-3111_v2.patch
>
>
> There is a possible deadlock while building local index or running upsert 
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same 
> table in that case there is a chance of memstore might reach the threshold of 
> blocking memstore size then RegionTooBusyException might be thrown back to 
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from 
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the 
> region. But if the split happen in between then split might be waiting for 
> write lock on the region to close and flush wait for readlock because the 
> write lock in the queue until the local index build completed. Local index 
> build won't complete because we are not allowed to write until there is 
> flush. This might not be complete deadlock situation but the queries might 
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 
> os_prio=31 tid=0x00007f7fb2050800 nid=0x1c033 waiting on condition 
> [0x0000000139b68000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
>         - locked <0x00000006ede69d00> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>         - <0x00000006ee132098> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303 
> waiting on condition [0x00000001388e9000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> As a fix we need to block region splits if building index, upsert select, 
> delete rows running at server.
> Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing 
> it. [~speleato] for finding it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to