[jira] [Commented] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

Rajeshbabu Chintaguntla (JIRA) Fri, 29 Jul 2016 00:14:52 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398856#comment-15398856
 ]


Rajeshbabu Chintaguntla commented on PHOENIX-3111:
--------------------------------------------------

[~jamestaylor]
bq. What sets isRegionClosing back to false if it's set to true in preClose()? 
Since the region is closing, do we always get a new instance of the 
UngroupedAggregateRegionObserver, so when it opens again, it be initialized to 
false?
Yes James we always get a new instance of UngroupedAggregateRegionObserver 
while reopening the region so then the the variable set to false. So we need 
not set back to false again any way by the time we want to set to true the 
region is going to closed and it will not server any further reads/writes.

bq. Or should we also explicitly be setting isRegionClosing to false in pre or 
postOpen()? Is there any chance that it would get stuck in a true state (i.e. 
can preClose() complete and then the close not actually happen?
No need to set to false by defailt it's initialized with false while 
initializing the coprocessors. There is no chance it will be left with true but 
just to make sure I will set to false again in start method which will be 
called while opening a region.

bq. We need more comments to explain this (even the mighty Lars is having a 
hard time - imagine the HBase novice like me trying to understand it). Can we 
have the Sergey's very nice comment above as javadoc around the declaration of 
the lock object? Plus, add to this a (5),(6),(7)... that describes at a high 
level the approach to solve this.
I can add the comments.

bq. How about adding an @GuardedBy("lock") to the scansReferenceCount 
declaration?
I can add this.

bq. What does blockingMemStoreSize represent exactly? Is it an upper bound on 
how many bytes can occur before the memstore is full? Add comment, please.
Yes it represents upper bound on sum of all memstores sizes in the region. Sure 
will add a document there as well.

bq. Then this code will basically delay the writing to the memstore while we're 
over this threshold for 3 seconds. Is this to give the flush a chance to happen 
(since that's what'll cause the memstore size to decrease)? Would be good to 
document this more.
Absolutely correct. Sure add documentation here as well.

bq. Will this throttling occur in the normal course of things and how 
significant a delay is the three seconds?
In normal case we don't need to wait much. In worst case compactions might 
delay flushes when we have max number of store files allowed in the region. if 
we are taking more than 3 seconds then definitely region really busy and 
RegionTooBusyException thrown back to client.

bq. In the non local index case, we take this code path purely as an 
optimization - would it be better to just not do this optimization and let the 
data go back to the client and be pushed back to the server?
This can be done and but we might need to rerun the full query even if one or 
few regions cause this issue. It can be worked on separate JIRA.

bq.Under what condition would the region close during the commitBatch call 
(since we're already blocking splits)? Is it if the region gets reassigned for 
some reason? Can we document this there?
Balancer or move region can close the region. We can document it.

bq. Should we be blocking a merge too? Or are we not so worried about those 
because they're not as common and user initiated?
Merges are rare and planned by the user it's ok not to block them because any 
way region close during merge will be waited.
To block the merges we need to write region server level coprocessors hook 
which need to be added to configurations at RS level.

[~samarthjain] 
bq.  Does the split or bulk load or other similar processes create their own 
instances of UngroupedAggregateRegionObserver?
No it's only initialized once while opening the region. 

> Possible Deadlock/delay while building index, upsert select, delete rows at 
> server
> ----------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3111
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3111
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sergio Peleato
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 4.8.1
>
>         Attachments: PHOENIX-3111.patch
>
>
> There is a possible deadlock while building local index or running upsert 
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same 
> table in that case there is a chance of memstore might reach the threshold of 
> blocking memstore size then RegionTooBusyException might be thrown back to 
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from 
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the 
> region. But if the split happen in between then split might be waiting for 
> write lock on the region to close and flush wait for readlock because the 
> write lock in the queue until the local index build completed. Local index 
> build won't complete because we are not allowed to write until there is 
> flush. This might not be complete deadlock situation but the queries might 
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 
> os_prio=31 tid=0x00007f7fb2050800 nid=0x1c033 waiting on condition 
> [0x0000000139b68000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
>         - locked <0x00000006ede69d00> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>         - <0x00000006ee132098> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303 
> waiting on condition [0x00000001388e9000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> As a fix we need to block region splits if building index, upsert select, 
> delete rows running at server.
> Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing 
> it. [~speleato] for finding it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

Reply via email to