[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2013-01-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544415#comment-13544415
 ] 

Hudson commented on HBASE-7103:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/])
HBASE-7103 Need to fail split if SPLIT znode is deleted even before the 
split is completed. (Ram) (Revision 1408421)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496086#comment-13496086
 ] 

Hudson commented on HBASE-7103:
---

Integrated in HBase-0.94-security #83 (See 
[https://builds.apache.org/job/HBase-0.94-security/83/])
HBASE-7103 Need to fail split if SPLIT znode is deleted even before the 
split is completed. (Ram) (Revision 1408421)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496025#comment-13496025
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Thanks for the info on the ZK stuff. :)

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495909#comment-13495909
 ] 

stack commented on HBASE-7103:
--

https://issues.apache.org/jira/browse/ZOOKEEPER-1297 adds a Stat to the create 
call.  It is not yet committed.  Patrick says that what we are doing is the 
best that can be done given current state of the API.

TRUNK patch looks good to me.

bq. Why so? Because now if the znode exists we will not start the split anyway 
so there is only one split right going on? 

That sounds right Ram.  So the execute that created the SPLITTING znode should 
be legit in rollback removing it.  Good stuff.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495871#comment-13495871
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Am very sorry Lars.  I forgot to remove the state from the enum.  I was 
thinking that it to be removed from the journal alone. 

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495747#comment-13495747
 ] 

Hudson commented on HBASE-7103:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #257 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/257/])
HBASE-7103 Need to fail split if SPLIT znode is deleted even before the 
split is completed. (Ram) (Revision 1408418)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495639#comment-13495639
 ] 

Hudson commented on HBASE-7103:
---

Integrated in HBase-0.94 #580 (See 
[https://builds.apache.org/job/HBase-0.94/580/])
HBASE-7103 Need to fail split if SPLIT znode is deleted even before the 
split is completed. (Ram) (Revision 1408421)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495625#comment-13495625
 ] 

Hudson commented on HBASE-7103:
---

Integrated in HBase-TRUNK #3532 (See 
[https://builds.apache.org/job/HBase-TRUNK/3532/])
HBASE-7103 Need to fail split if SPLIT znode is deleted even before the 
split is completed. (Ram) (Revision 1408418)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495540#comment-13495540
 ] 

Lars Hofhansl commented on HBASE-7103:
--

The unused state is still on the enum. No need for a new patch, Ram, will 
remove it on commit.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495471#comment-13495471
 ] 

Lars Hofhansl commented on HBASE-7103:
--

Will commit in the next our unless I hear objections. It's time to tie a bow 
around 0.94.3.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495141#comment-13495141
 ] 

Hadoop QA commented on HBASE-7103:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12553079/HBASE-7103_trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
87 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 18 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3309//console

This message is automatically generated.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495120#comment-13495120
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Attached patch for trunk and 0.94.
I think i have removed the unused state Lars. Added the comments and also the 
TODO.
@Stack
I was thinking about new state and infact had some idea on mind.  But did not 
want to complicate it now with new states and handling it in master side should 
be done with proper care.
Anyway will come up with some idea sooner.
bq.deleting a znode though we're not sure it is ours
This could be a problem?  Why so?  Because now if the znode exists we will not 
start the split anyway so there is only one split right going on?  Anyway the 
node deletion is done by master.  May be am missing something Stack.
Thanks a lot for review.  



> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_0.94.patch, HBASE-7103_testcase.patch, HBASE-7103_trunk.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495101#comment-13495101
 ] 

stack commented on HBASE-7103:
--

+1 on patch.  There is still a hole in here (deleting a znode though we're not 
sure it is ours) but it is narrower than the hole that was there previous.  I 
would suggest removing the unused state as per Lars comment above and adding 
comment that we could be removing znode that we do not own if the transition 
from SPLITTING to SPLITTING fails (maybe we should create it w/o data or w/ 
another state but can do that in another issue... just note the problem in a 
TODO comment for now).  Good on you Ram.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495040#comment-13495040
 ] 

Lars Hofhansl commented on HBASE-7103:
--

Also a trunk patch would be awesome so we can run against HadoopQA.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495008#comment-13495008
 ] 

Lars Hofhansl commented on HBASE-7103:
--

This is not really my area of expertise, but the patch makes sense. We should 
probably remove STARTED_SPLITTING from JournalEntry (it's not use anywhere 
after this patch). Otherwise +1.

@Stack: Wanna have a look?


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494845#comment-13494845
 ] 

Hadoop QA commented on HBASE-7103:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12553013/HBASE-7103_0.94.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3306//console

This message is automatically generated.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-11 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494843#comment-13494843
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

This patch makes the transition to SPLITTING after creating the node once the 
first journal entry is added.
What do we get out of this is
-> If any parallel split request comes the second one will fail because the 
znode creation will fail saying node already exists.  So there is no impact due 
to rollback as nothing is added in the journal for the second split and so no 
deletion of the znode happens.
-> Now if while transitioning to SPLITTING if it fails, then it will lead to 
rollback that will delete the znode.  Anyway that is not going to impact 
anything on the RIT in master side as only after the transition is done the RIT 
will be populated first time. If nothing is there then no impact.
Pls provide your comments on this, i can prepare for trunk too if this is fine. 
:)

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_0.94.patch, 
> HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494826#comment-13494826
 ] 

stack commented on HBASE-7103:
--

I'd be good w/ applying the revert patch for now -- would be interested in what 
you say to the above first though Ram and who knows, maybe the zk fellas will 
come back w/ a little bit of magic

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494825#comment-13494825
 ] 

stack commented on HBASE-7103:
--

[~ram_krish] I don't think it possible getting version on create (Let me ask 
one of the zk lads).  That is why we do the SPLITTING to SPLITTING transition 
to get the versoin.  Its true though that there is a hole in here because if we 
fail on create, there should be no rollback but if we fail moving SPLITTING to 
SPLITTING, then we should remove the created znode but ONLY if we have its 
version (it could have been created by someone else).  Maybe when we create, we 
write some unique data into the znode and get it after creating it to see what 
the version is -- and if the unique data is not the same, we know that someone 
else owns the znode and we should not rollback  but that won't work either 
given it won't be backward compatible.

If we fail the create of the znode, we should not rollback. It looks like we 
are doing that now since adding the STARTED_SPLITTING state -- right?  That 
seems wrong... should we be inserting the STARTED_SPLITTING state after the 
create of the znode?  But even then, I'm not sure about deleting a znode unless 
we are sure we own it -- that the version matches.

Should the following code be checking we did NOT get a -1?

{code}
this.znodeVersion = createNodeSplitting(server.getZooKeeper(),
  this.parent.getRegionInfo(), server.getServerName());
{code}

It seems like createNodeSplitting could be returning -1 if it fails.

(Weird that transitionNode has explicit mention of M_ZK_REGION_OFFLINE and 
RS_ZK_REGION_OPENING though it takes beginState and endState but that is 
another issue).

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494723#comment-13494723
 ] 

Lars Hofhansl commented on HBASE-7103:
--

Hi Ram,

yeah I found that assert as well. Btw. the test you attached does not work in 
trunk, because there is no AssignmentManager.getRegionsOfTable(...) in trunk.

I like the idea of getting/verifying the version. Looks like this would solve 
the 6088 without the extra state (unless I misunderstand).

I don't follow your 2nd comment. Are you saying your first suggestion does not 
work?
Otherwise +1 on your idea to check the version. If you have a patch on top of 
the rollback patch that'd be awesome.
Thanks for working on this stuff Ram! It's hard to make changes to this without 
breaking something somewhere else, because this code is so fragile.


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494691#comment-13494691
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Doing as said above has one implication, because the SPLITTING node is created 
on the RS side the master does not get the callback for node created.  Hence 
the in memory RIT SPLITTING state is not added to master.
But once the transtion happens to SPLIT the nodeDataChange adds the state to RIT
{code}
if (regionState == null) {
regionState = addSplittingToRIT(sn, encodedName);
String message = "Received SPLIT for region " + 
prettyPrintedRegionName +
  " from server " + sn;
// If still null, it means we cannot find it and it was already 
processed
if (regionState == null) {
  LOG.warn(message + " but it doesn't exist anymore," +
  " probably already processed its split");
  break;
}
LOG.info(message +
" but region was not first in SPLITTING state; continuing");
{code}

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494657#comment-13494657
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Also Lars the last assert should be asserting for false.  Because once split is 
successful the main parent region should not be in RIT
{code}
assertTrue("The region should be online", 
rit.containsKey(hri.getTableNameAsString()));
{code}

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494652#comment-13494652
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Ok Lars.  I understand.  No problem.  
Just before we commit this i have a suggestion
{code}
String node = ZKAssign.getNodeName(zkw, region.getEncodedName());
if (!ZKUtil.createEphemeralNodeAndWatch(zkw, node, data.getBytes())) {
  throw new IOException("Failed create of ephemeral " + node);
}
// Transition node from SPLITTING to SPLITTING and pick up version so we
// can be sure this znode is ours; version is needed deleting.
return transitionNodeSplitting(zkw, region, serverName, -1);
{code}
Here after creating the node we once transit the node from SPLITTING to 
SPLITTING to get znode version.  Can we get the znode version just after 
creating the node.
So if creation itself fails there is no node at all.  If it succeeds anyway as 
next step will add the journal SET_SPLITTING_IN_ZK.
Now the transition will result in the version as 1 but if we don do the 
transition it will be 0.
Now what advantage we get is next time if any parallel split comes the node 
will already exist when it tries to create the znode and this will not do 
anything with the znode while rollback.  What do you feel?  My intention was to 
solve both 7103 and 6088.  
Lars, i leave it to you.  If you think we can revert this and address this in 
next version 0.94.4.  If not we can try for a patch this version.  If you are 
ok with that i can submit a patch for the same.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494589#comment-13494589
 ] 

Hadoop QA commented on HBASE-7103:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552963/7103-6088-revert.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
87 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 18 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestMiniClusterLoadEncoded

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3301//console

This message is automatically generated.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494432#comment-13494432
 ] 

Lars Hofhansl commented on HBASE-7103:
--

In fact just reverting HBASE-6088 seems to be fine. That is what I am proposing 
now.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494369#comment-13494369
 ] 

Lars Hofhansl commented on HBASE-7103:
--

I would like to entertain the thought of revert both HBASE-6854 and HBASE-6088 
for 0.94. (possible scheduling them both to 0.94.4 along with this one and 
HBASE-7101 to fix these all together).

I ran your test with these two patches reverted. It now fails in the last 
assert (where the RS and Master disagree whether the region is online or not). 
That is not ideal, but was a longstanding issue (I think).

[~ram_krish] I realize this is frustrating. At the same time I think that for 
0.94 we have start thinking about an expectation of stability.

Thoughts about this?


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494025#comment-13494025
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

@Lars
HBASE-6088 added the new journal entry.  Because previously the 
STARTED_SPLITTING was never added.  So what happened was once we try to write 
the data RS_ZK_SPLITTING after creating the node and if that fails then on 
rollback we don take action and so subsequent splitting never happened.
bq.can't we keep dictionary keyed by region of currently splitting regions in 
the RS?
But the clearing of the dictionary should be done properly after the transition 
is done.  Chances of race between the time we remove and the time we check if 
already present.  May be we need to cross verify with the online regions list 
in the RS side.



> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493741#comment-13493741
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Ok.. now i dont have the code with me.  Let me check the code and comment on 
this.  Thanks Lars and Stack.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493354#comment-13493354
 ] 

stack commented on HBASE-7103:
--

Yeah, Lars' idea is like I was saying.  Else, can't we keep dictionary keyed by 
region of currently splitting regions in the RS?

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493258#comment-13493258
 ] 

Lars Hofhansl commented on HBASE-7103:
--

Does my idea from above:

bq. First try to create a ZK node, then write to the journal.

Fix this? In that case the parallel split request would fail before it writes 
anything in its journal and hence would not attempt to clean up the ZK state.


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493025#comment-13493025
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

bq.Could start the transaction and if can't set SPLITTING znode, fail out.
But how to determine if a request is coming parallely and a request that is 
coming newly after a previous one had failed.
Because if the node got created and the same thing failed due to some exception 
we will rollback.  Here we need to delete the node.
Next again if the new request comes this will succeed.
If we try to handle the failure by not deleting the node how can we diff a new 
request and a parallel request. Will think more on this.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493023#comment-13493023
 ] 

Matt Corgan commented on HBASE-7103:


{quote}Not only compaction, frequent flushes that results in big store files 
also may result in this?{quote}
When triggering this problem I was doing frequent flushes, and compactions were 
probably backlogged for the region.  

{quote}Is that correct? Should we only be doing it after compaction? Is that 
why we are doing concurrent split?{quote}
It would be nice to keep the ability (if it already exists) for a region to 
split without waiting for all the flushing/compacting to stop because the 
flushing/compacting may go on indefinitely.  The split is important in this 
scenario since it spreads the load to another server.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493018#comment-13493018
 ] 

stack commented on HBASE-7103:
--

bq. But a forceful split can happen parallely right?

Yeah, probably.  I see no checks to prevent it.

bq. Not only compaction, frequent flushes that results in big store files also 
may result in this?

Is that correct?  Should we only be doing it after compaction?  Is that why we 
are doing concurrent split?

bq. Anyway to chec if a split request has come for an already going on split? 
Because currently every split request creates a new split transaction.

No.  Could start the transaction and if can't set SPLITTING znode, fail out.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492961#comment-13492961
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Yes Stack.  Even am not sure why two splits started.
But a forceful split can happen parallely right? I may be wrong here.
Not only compaction, frequent flushes that results in big store files also may 
result in this?
{code}
boolean shouldCompact = region.flushcache();
  // We just want to check the size
  boolean shouldSplit = region.checkSplit() != null;
  if (shouldSplit) {
this.server.compactSplitThread.requestSplit(region);
  } else if (shouldCompact) {
server.compactSplitThread.requestCompaction(region, getName());
  }
{code}
Anyway to chec if a split request has come for an already going on split? 
Because currently every split request creates a new split transaction.


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492614#comment-13492614
 ] 

stack commented on HBASE-7103:
--

bq. -> Another split starts at the same time for the same region P1. (Not sure 
why this started).

Do we know more on the above?   Why two splits at same time are even possible?  
We only check split at end of a compaction so this region is compacting 
frequently, so frequently, we can queue up splits so they can run near 
concurrent?



> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492583#comment-13492583
 ] 

Lars Hofhansl commented on HBASE-7103:
--

[~ram_krish] Thanks again Ram. Nice test.
So rolling back HBASE-6854 would not be good enough anyway (and reverting 
HBASE-6088 and HBASE-6854 is excessive).

Hopefully with the test we have a good chance of fixing this.
No haste here, Ram. Will delay the next 0.94RC until this we can fix this.


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492502#comment-13492502
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Actually HBASE-6088 introduced the STARTED_SPLITTING.  This was done so that 
first time when we try to create the znode with RS_ZK_SPLITTING state if there 
is any exception rollback was not taking any action.  This was leading to 
subsequent split failures and thus split never happened.
Now the new state STARTED_SPLITTING will delete the node on rollback if any 
error while setting the data.  Even if any exception happens in 
SET_SPLITTING_IN_ZK even then the same clean up is getting done.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492404#comment-13492404
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

So just checked without HBASE-6854.  The problem is something similar.  What 
happens is from the master we do not remove from RIT if HBASE-6854 is not 
present.
But RS thinks split is completed but as the node got deleted due to second 
split's rollback there is no transition from SPLITTING to SPLIT.  So the master 
is never notified about this. 
After HBASE-6854 the entry in RIt is removed but still the master does not know 
that split has happened.  Before that RIT is removed which allows atleast the 
balancer to run.
So we may have to come up with a better one 


> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492131#comment-13492131
 ] 

Lars Hofhansl commented on HBASE-7103:
--

[~ram_krish] Would it be most expedient to revert HBASE-6854 for now?
Then we can tackle these two issues together. Thoughts?

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492122#comment-13492122
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

@JD
HBASE-6854 is a reason for this. But prior to this i can check and tell you the 
behaviour.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492057#comment-13492057
 ] 

Lars Hofhansl commented on HBASE-7103:
--

I have very limited knowledge in this area... From looking through the code 
briefly, if two splits happen roughly the parallel the 2nd one will fail due to 
the split node already existing (see SplitTransaction.createNodeSplitting), but 
I then it already wrote STARTED_SPLITTING to its journal. Now the transaction 
is rolled back and will cleanup the ZK state.

So I guess we can either:
# track whether the split transaction failed because of a concurrent split, in 
that case we won't clean the zk state.
# First try to create a ZK node, then write to the journal.

Both cases probably have bad side effects and races.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492012#comment-13492012
 ] 

Matt Corgan commented on HBASE-7103:


I disabled my custom balancer (external java program that calls 
HBaseAdmin.move()) and it's been working without error for longer than usual.  
I've been using the same balancer since .90 series I think, so possibly 
something changed where calling move() on a new daughter region soon after a 
split leaves ZK in a bad state.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491995#comment-13491995
 ] 

Lars Hofhansl commented on HBASE-7103:
--

Did you change how frequently the balancer runs?

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491798#comment-13491798
 ] 

Matt Corgan commented on HBASE-7103:


So far I don't think it has to do with stopping the regionserver since we're 
only doing that after this happens.  I also haven't seen anything suggesting it 
has to do with the META table.

Any ideas on what would cause a split to fail and retry?  Is it more likely 
caused by some internal regionserver problem, or the region being moved during 
the split, etc?

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491770#comment-13491770
 ] 

Lars Hofhansl commented on HBASE-7103:
--

On the mailing list I had posted these as candidates:

HBASE-6854
HBASE-6329
HBASE-6088
HBASE-6070
HBASE-6713
HBASE-5986

Some of these deal with split during regionserver stops.

[~mcorgan] I assume you're still restarting servers occasionally?

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491669#comment-13491669
 ] 

Matt Corgan commented on HBASE-7103:


>From step 3, do you think the double-splitting is a new phenomenon?  It 
>doesn't sound like something that should happen very often.  Maybe that would 
>explain why i didn't get this error in .94.0.

Also, please note I went straight from .94.0 to .94.2, so I don't know if it 
was present in .94.1.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491656#comment-13491656
 ] 

Jean-Daniel Cryans commented on HBASE-7103:
---

I wonder which jira introduced this issue as it seems that it wasn't present 
before 0.94.2

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491653#comment-13491653
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Trying to write a testcase for this?  Can we fail the split if the znode is not 
present? But my doubt is if a split is currently going on for the region A and 
if another split is called for the same region how should we handle it?  
Ideally this prob is caused because of the rollback that is done by the second 
split.
Thinking on this.  Any suggestions?

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7103) Need to fail split if SPLIT znode is deleted even before the split is completed.

2012-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491358#comment-13491358
 ] 

ramkrishna.s.vasudevan commented on HBASE-7103:
---

Will try to come up with a patch for this.

> Need to fail split if SPLIT znode is deleted even before the split is 
> completed.
> 
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure 
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit the 
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira