[jira] [Comment Edited] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

2016-07-22 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389608#comment-15389608
 ] 

James Taylor edited comment on PHOENIX-3111 at 7/23/16 12:49 AM:
-

Just want to make sure I understand the scenario under which this can occur. 
Does the problem only occur when we're reading and writing to the same table 
through a coprocessor? Aren't splits already blocked because we're holding a 
region lock through the region.startRegionOperation() call?

FWIW, both the DELETE and UPSERT SELECT cases are optimization which we could 
disable  and execute them on the client if need be. The extra cost would be the 
data transfer between the client and server (which is probably pretty 
negligible for the DELETE case).

Is it feasible to block splits in these cases and if so, what would the 
implications be? Is this a theoretical problem, or have you actually seen it 
happen and can repro it?


was (Author: jamestaylor):
Just want to make sure I understand the scenario under which this can occur. 
Does the problem only occur when we're reading and writing to the same table 
through a coprocessor? FWIW, both the DELETE and UPSERT SELECT cases are 
optimization which we could disable  and execute them on the client if need be. 
The extra cost would be the data transfer between the client and server (which 
is probably pretty negligible for the DELETE case).

Is it feasible to block splits in these cases and if so, what would the 
implications be?

> Possible Deadlock/delay while building index, upsert select, delete rows at 
> server
> --
>
> Key: PHOENIX-3111
> URL: https://issues.apache.org/jira/browse/PHOENIX-3111
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sergio Peleato
>Assignee: Rajeshbabu Chintaguntla
>Priority: Blocker
> Fix For: 4.8.0
>
>
> There is a possible deadlock while building local index or running upsert 
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same 
> table in that case there is a chance of memstore might reach the threshold of 
> blocking memstore size then RegionTooBusyException might be thrown back to 
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from 
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the 
> region. But if the split happen in between then split might be waiting for 
> write lock on the region to close and flush wait for readlock because the 
> write lock in the queue until the local index build completed. Local index 
> build won't complete because we are not allowed to write until there is 
> flush. This might not be complete deadlock situation but the queries might 
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 
> os_prio=31 tid=0x7f7fb2050800 nid=0x1c033 waiting on condition 
> [0x000139b68000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
> - locked <0x0006ede69d00> (a java.lang.Object)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Threa

[jira] [Comment Edited] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

2016-07-28 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398303#comment-15398303
 ] 

James Taylor edited comment on PHOENIX-3111 at 7/28/16 10:17 PM:
-

Thanks again for the patch, [~rajeshbabu] & [~sergey.soldatov]. Here are some 
comments/questions:
- What sets {{isRegionClosing}} back to false if it's set to true in 
preClose()? Since the region is closing, do we always get a new instance of the 
UngroupedAggregateRegionObserver, so when it opens again, it be initialized to 
false? Or should we also explicitly be setting {{isRegionClosing}} to false in 
pre or postOpen()? Is there any chance that it would get stuck in a true state 
(i.e. can preClose() complete and then the close not actually happen?
- We need more comments to explain this (even the mighty Lars is having a hard 
time - imagine the HBase novice like me trying to understand it). Can we have 
the Sergey's very nice comment above as javadoc around the declaration of the 
lock object? Plus, add to this a (5),(6),(7)... that describes at a high level 
the approach to solve this.
- How about adding an {{@GuardedBy("lock")}} to the {{scansReferenceCount}} 
declaration?
- What does {{blockingMemStoreSize}} represent exactly? Is it an upper bound on 
how many bytes can occur before the memstore is full? Add comment, please.
{code}
+final long blockingMemStoreSize = flushSize * (
+conf.getLong(HConstants.HREGION_MEMSTORE_BLOCK_MULTIPLIER,
+
HConstants.DEFAULT_HREGION_MEMSTORE_BLOCK_MULTIPLIER)-1) ;
{code}
- Then this code will basically delay the writing to the memstore while we're 
over this threshold for 3 seconds. Is this to give the flush a chance to happen 
(since that's what'll cause the memstore size to decrease)? Would be good to 
document this more. Will this throttling occur in the normal course of things 
and how significant a delay is the three seconds? In the non local index case, 
we take this code path purely as an optimization - would it be better to just 
not do this optimization and let the data go back to the client and be pushed 
back to the server?
{code}
+  // We are waiting 3 seconds for the memstore flush happen
+  for (int i = 0; region.getMemstoreSize() > blockingMemstoreSize && i < 
30; i++) {
+  try {
+  checkForRegionClosing();
+  Thread.sleep(100);
+  } catch (InterruptedException e) {
+  throw new IOException(e);
+  }
+  }
+  checkForRegionClosing();
{code}
- Under what condition would the region close during the commitBatch call 
(since we're already blocking splits)? Is it if the region gets reassigned for 
some reason? Can we document this there?
- Should we be blocking a merge too? Or are we not so worried about those 
because they're not as common and user initiated?


was (Author: jamestaylor):
Thanks again for the patch, [~rajeshbabu] & [~sergey.soldatov]. Here are some 
comments/questions:
- What sets {{isRegionClosing}} back to false if it's set to true in 
preClose()? Since the region is closing, do we always get a new instance of the 
UngroupedAggregateRegionObserver, so when it opens again, it be initialized to 
false? Or should we also explicitly be setting {{isRegionClosing}} to false in 
pre or postOpen()? Is there any chance that it would get stuck in a true state 
(i.e. can preClose() complete and then the close not actually happen?
- We need more comments to explain this (even the mighty Lars is having a hard 
time - imagine the HBase novice like me trying to understand it). Can we have 
the Sergey's very nice comment above as javadoc around the declaration of the 
lock object? Plus, add to this a (5),(6),(7)... that describes at a high level 
the approach to solve this.
- How about adding an {{@GuardedBy("lock")}} to the {{scansReferenceCount}} 
declaration?
- What does {{blockingMemStoreSize}} represent exactly? Is it an upper bound on 
how many bytes can occur before the memstore is full? Add comment, please.
{code}
+final long blockingMemStoreSize = flushSize * (
+conf.getLong(HConstants.HREGION_MEMSTORE_BLOCK_MULTIPLIER,
+
HConstants.DEFAULT_HREGION_MEMSTORE_BLOCK_MULTIPLIER)-1) ;
+
- Then this code will basically delay the writing to the memstore while we're 
over this threshold for 3 seconds. Is this to give the flush a chance to happen 
(since that's what'll cause the memstore size to decrease)? Would be good to 
document this more. Will this throttling occur in the normal course of things 
and how significant a delay is the three seconds? In the non local index case, 
we take this code path purely as an optimization - would it be better to just 
not do this optimization and let the data go back to the client and be pushed 
back to the server?
{code}
+  // We are wa

[jira] [Comment Edited] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

2016-07-28 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398720#comment-15398720
 ] 

Samarth Jain edited comment on PHOENIX-3111 at 7/29/16 5:23 AM:


[~rajeshbabu] - my question is an extension to what [~jamestaylor] had. Does 
the split or bulk load or other similar processes create their own instances of 
UngroupedAggregateRegionObserver? 

{code}
@Override
public void preSplit(ObserverContext c, 
byte[] splitRow)
throws IOException {
// Don't allow splitting if operations need read and write to same 
region are going on in the
// the coprocessors to avoid dead lock scenario. See PHOENIX-3111.
synchronized (lock) {
if (scansReferenceCount != 0) {
throw new IOException("Operations like local index 
building/delete/upsert select"
+ " might be going on so not allowing to split.");
}
}
}

@Override
public void preBulkLoadHFile(ObserverContext 
c,
List> familyPaths) throws IOException {
// Don't allow bulkload if operations need read and write to same 
region are going on in the
// the coprocessors to avoid dead lock scenario. See PHOENIX-3111.
synchronized (lock) {
if (scansReferenceCount != 0) {
throw new DoNotRetryIOException("Operations like local index 
building/delete/upsert select"
+ " might be going on so not allowing to bulkload.");
}
}
}

@Override
public void preClose(ObserverContext c, 
boolean abortRequested)
throws IOException {
synchronized (lock) {
while (scansReferenceCount != 0) {
isRegionClosing = true;
try {
lock.wait(1000);
} catch (InterruptedException e) {
}
}
}
}
{code}

If yes, then wouldn't they just see default values of member variables 
isRegionClosing, scansReferenceCount, etc? If yes, how is the whole 
synchronization going to work? I *think* you are probably going to need a 
singleton to track state at the region server level. See 
StatisticsCollectionRunTracker as an example which tracks state for every 
region undergoing stats collection via compaction/update stats command. Also 
worth seeing as an example is how it stops an update stats command from doing 
stats collection if major compaction is kicked off.


was (Author: samarthjain):
[~rajeshbabu] - my question is an extension to what [~jamestaylor] had. Does 
the split process creates its own instance of UngroupedAggregateRegionObserver? 
If yes, then wouldn't it see default values of member variables 
isRegionClosing, scansReferenceCount etc? If yes, how is the whole 
synchronization going to work? I *think* you are probably going to need a 
singleton to track state at the region server level. See 
StatisticsCollectionRunTracker as an example of how this could be done.

> Possible Deadlock/delay while building index, upsert select, delete rows at 
> server
> --
>
> Key: PHOENIX-3111
> URL: https://issues.apache.org/jira/browse/PHOENIX-3111
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sergio Peleato
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.8.1
>
> Attachments: PHOENIX-3111.patch
>
>
> There is a possible deadlock while building local index or running upsert 
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same 
> table in that case there is a chance of memstore might reach the threshold of 
> blocking memstore size then RegionTooBusyException might be thrown back to 
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from 
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the 
> region. But if the split happen in between then split might be waiting for 
> write lock on the region to close and flush wait for readlock because the 
> write lock in the queue until the local index build completed. Local index 
> build won't complete because we are not allowed to write until there is 
> flush. This might not be complete deadlock situation but the queries might 
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 
> os_prio=31 tid=0x7f7fb2050800 nid=0x1c033 waiting on condition 
> [0x000139b68000]
>java.lang.Thread.State: WAITING (parking)
> 

[jira] [Comment Edited] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

2017-09-14 Thread Rajeshbabu Chintaguntla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165869#comment-16165869
 ] 

Rajeshbabu Chintaguntla edited comment on PHOENIX-3111 at 9/14/17 7:29 AM:
---

[~jamestaylor] Can you please provide some details like what operations going 
on at that moment like any upsert selects to the same table, deletes or index 
rebuilding?
In HBase what we can do is when ever NotServingRegionException thrown from 
coprocessors we need to retry from the HBase client but it's not happening 
because  it's wrapped with different exception(Forgot the exact one :() and 
client is not retrying after that. 


was (Author: rajeshbabu):
[~jamestaylor] Can you please provide some details like what operations going 
on at that moment like any upsert selects to the same table, deletes or index 
rebuilding?

> Possible Deadlock/delay while building index, upsert select, delete rows at 
> server
> --
>
> Key: PHOENIX-3111
> URL: https://issues.apache.org/jira/browse/PHOENIX-3111
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sergio Peleato
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.8.0
>
> Attachments: PHOENIX-3111_addendum.patch, PHOENIX-3111.patch, 
> PHOENIX-3111_v2.patch
>
>
> There is a possible deadlock while building local index or running upsert 
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same 
> table in that case there is a chance of memstore might reach the threshold of 
> blocking memstore size then RegionTooBusyException might be thrown back to 
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from 
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the 
> region. But if the split happen in between then split might be waiting for 
> write lock on the region to close and flush wait for readlock because the 
> write lock in the queue until the local index build completed. Local index 
> build won't complete because we are not allowed to write until there is 
> flush. This might not be complete deadlock situation but the queries might 
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 
> os_prio=31 tid=0x7f7fb2050800 nid=0x1c033 waiting on condition 
> [0x000139b68000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
> - locked <0x0006ede69d00> (a java.lang.Object)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - <0x0006ee132098> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x7f7fb6842000 nid=0x19303 
> waiting on condition [0x0001388e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
>

[jira] [Comment Edited] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server

2017-09-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166604#comment-16166604
 ] 

Andrew Purtell edited comment on PHOENIX-3111 at 9/14/17 5:06 PM:
--

bq. From HBase side we need to revert HBASE-14893 and handle 
NotServingRegionException thrown in coprocessors properly

This needs very careful thought and extensive testing before any changes should 
be committed. Any change to locking discipline around region 
flushing/closing/splitting can be the road to data loss. Look at what happened 
with the storefile locking changes that went in to 1.3, for example. We should 
never have done it. Once burned, twice shy.

I wonder if we should extend use of Phoenix's new LockManager to other cases, 
so we avoid tinkering with complex internal locking state and semantics in 
HBase. Locking disciplines are not part of HBase compatibility guarantees. They 
will vary between code lines and might even change between point releases. 

This doesn't necessarily help where internal HBase locking is causing issues, 
but we are going to have an extremely high bar for admitting changes to locking 
going forward.

[~lhofhansl]
[~stack]
[~mantonov]
[~ghelmling]


was (Author: apurtell):
bq. From HBase side we need to revert HBASE-14893 and handle 
NotServingRegionException thrown in coprocessors properly

This needs very careful thought and extensive testing before any changes should 
be committed. Any change to locking discipline around region 
flushing/closing/splitting can be the road to data loss. Look at what happened 
with the storefile locking changes that went in to 1.3, for example. We should 
never have done it. Once burned, twice shy.

I wonder if we should extend use of Phoenix's new LockManager to other cases, 
so we avoid tinkering with complex internal locking state and semantics in 
HBase. Locking disciplines are not part of HBase compatibility guarantees. They 
will vary between code lines and might even change between point releases. 

This doesn't necessarily help where internal HBase locking is causing issues, 
but we are going to have an extremely high bar for admitting changes to locking 
going forward.

> Possible Deadlock/delay while building index, upsert select, delete rows at 
> server
> --
>
> Key: PHOENIX-3111
> URL: https://issues.apache.org/jira/browse/PHOENIX-3111
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sergio Peleato
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.8.0
>
> Attachments: PHOENIX-3111_addendum.patch, PHOENIX-3111.patch, 
> PHOENIX-3111_v2.patch
>
>
> There is a possible deadlock while building local index or running upsert 
> select, delete at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same 
> table in that case there is a chance of memstore might reach the threshold of 
> blocking memstore size then RegionTooBusyException might be thrown back to 
> client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from 
> the data table and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the 
> region. But if the split happen in between then split might be waiting for 
> write lock on the region to close and flush wait for readlock because the 
> write lock in the queue until the local index build completed. Local index 
> build won't complete because we are not allowed to write until there is 
> flush. This might not be complete deadlock situation but the queries might 
> take lot of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 
> os_prio=31 tid=0x7f7fb2050800 nid=0x1c033 waiting on condition 
> [0x000139b68000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0006ede72550> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.do