[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2015-01-08 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269161#comment-14269161
 ] 

Cosmin Lehene commented on HBASE-5155:
--

[~ramkrishna.s.vasude...@gmail.com] [~apurtell] still valid?

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-05-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266099#comment-13266099
 ] 

stack commented on HBASE-5155:
--

Should we revert and roll a 0.90.7?

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265633#comment-13265633
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

@David
Updated the release notes. Thanks for your review.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-04-30 Thread David S. Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265114#comment-13265114
 ] 

David S. Wang commented on HBASE-5155:
--

Ram,

> If the HBase client does not have the changes for HBASE-5155 and the server 
> has the  changes for HBASE-5155, then if we try to Enable a table then the 
> client will hang. 

Actually, I noticed that the hang happens in the opposite case: when the client 
has the changes for HBASE-5155, and the server does not.

Otherwise the release note looks OK to me.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231861#comment-13231861
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-TRUNK-security #140 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/140/])
HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231637#comment-13231637
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-0.94 #36 (See 
[https://builds.apache.org/job/HBase-0.94/36/])
HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231579#comment-13231579
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-TRUNK #2685 (See 
[https://builds.apache.org/job/HBase-TRUNK/2685/])
HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-14 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229926#comment-13229926
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-TRUNK-security #138 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/138/])
HBASE-5206 port HBASE-5155 to TRUNK (Ashutosh Jindal) (Revision 1300711)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-14 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229638#comment-13229638
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-TRUNK #2680 (See 
[https://builds.apache.org/job/HBase-TRUNK/2680/])
HBASE-5206 port HBASE-5155 to TRUNK (Ashutosh Jindal) (Revision 1300711)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-14 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186299#comment-13186299
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

Committed to 0.90.
Stack , Ted and Chunhui for the review.  

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-14 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186243#comment-13186243
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

I am planning to commit this today.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-14 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186169#comment-13186169
 ] 

Zhihong Yu commented on HBASE-5155:
---

HBASE-5155_2.patch looks good to me.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
> HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186089#comment-13186089
 ] 

Zhihong Yu commented on HBASE-5155:
---

bq. will tell the user like to ZK is used in checking it.
I think you wanted to say 'will tell the user like no ZK is used in checking 
it.'

I don't have other comments, except the one @ 13/Jan/12 21:34

If you have time, please port this to 0.92
Otherwise we can open another JIRA.

Good job, Ramkrishna.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_latest.patch, 
> hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186084#comment-13186084
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

bq.This looks like a method used internally by AM only. Does it need to be 
public?
{code}
+  public void setEnabledTable(String tableName) {
{code}
I did not have this as public in the beginning.  But later in 
HMaster.rebuildUserRegions() i had to set enabled table.  So thought of 
exposing this from AM so that i can use it there instead of repeating the same 
code in HMaster.
bq.Have you tried it? In rolling restart we'll upgrade the master first 
usually. Won't it know how to deal w/ new zk node for ENABLED state?
If master is restarted first even then the above changes will be necessary as 
when the master builds the table state he will not find the ENABLED state in 
zk.  So the above changes in Master will help him to build that state.  Yes 
rolling restart was tested.

bq.FYI, don't do these kinda changes in future:
When applying a formatter it happened. Sure Stack i will take care of those 
changes.
{code}public boolean isEnabledTable(String tableName) {
-synchronized (this.cache) {
-  // No entry in cache means enabled table.
-  return !this.cache.containsKey(tableName);
-}
+return isTableState(tableName, TableState.ENABLED);
{code}
The isTableState will anyway have a synchronized(this.cache) so it should be ok?

{code}
+   * Check if the table is in DISABLED state in cache
{code}
My idea of adding 'in cache' was like the state is checked only in Memory and 
it is not going to zk to check the state . So i thought like the 'in cache' 
word will tell the user like to ZK is used in checking it.
{code}
+// Enable the ROOT table if on process fail over the RS containing ROOT
+// was active.
{code}
This scenario comes when the master is restarted but the RS is still alive.  
Now the master should enable the ROOT and META also because when he comes up he 
should create the enabled node in zk.
If we don't do this step then for ROOT and META we will not have a node in zk 
in the above scenario.
But if the master explicitly assign ROOT and META then there will be a zk node. 
So to unify this i had to do the zkTable.setEnabledTable().
@Stack
Is it fine Stack? I can reprepare a patch based on your feedback and then 
upload a final one?
@Ted
You have any more comments or feedback so that i can incorporate in the next 
patch.



> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_latest.patch, 
> hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), tr

[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185878#comment-13185878
 ] 

Zhihong Yu commented on HBASE-5155:
---

Minor comment:
{code}
+boolean istableEnabled = this.zkTable.isEnabledTable(tableName);
{code}
istableEnabled should be named isTableEnabled.

@Stack:
w.r.t the following comment:
{code}
+// Enable the ROOT table if on process fail over the RS containing ROOT
+// was active.
{code}
AssignmentManager delegates to this.zkTable.setEnabledTable(). This is to set 
the meta tables enabled in ZkTable cache.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_latest.patch, 
> hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185858#comment-13185858
 ] 

stack commented on HBASE-5155:
--

bq. So if we have a rolling restart scenario then this will be a problem right 
? Previously the table node will not be present for the Enabled state but now 
we will create it.

Have you tried it?  In rolling restart we'll upgrade the master first usually.  
Won't it know how to deal w/ new zk node for ENABLED state?

FYI, don't do these kinda changes in future:

{code}
-  for (HRegionInfo region: regions) {
+  for (HRegionInfo region : regions) {
{code}

What was there previous was fine... It adds bulk to your patch.

This looks like a method used internally by AM only.  Does it need to be public?

{code}
+  public void setEnabledTable(String tableName) {
{code}

In processDeadRegion, should we check parent exists before doing daughter 
fixups? (It could have been deleted?)

I don't undersand this comment:

{code}
+// Enable the ROOT table if on process fail over the RS containing ROOT
+// was active.
{code}

Same for the one on .meta.

Why we have to enable the meta and root tables?  Aren't they always on?

Is this right:

{code}
+   * Check if the table is in DISABLED state in cache
{code}

Is it just checking cache?  This class gets updated when the zk changes right?  
So its not just a 'cache'?  I think should drop 'from cache' in your public 
javadoc.

Same for isDisabling, etc

Is this right below:

{code}
   public boolean isEnabledTable(String tableName) {
-synchronized (this.cache) {
-  // No entry in cache means enabled table.
-  return !this.cache.containsKey(tableName);
-}
+return isTableState(tableName, TableState.ENABLED);
{code}

Else patch looks good to me.  Was afraid it too much for 0.90.6 but its looking 
ok.




> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_1.patch, HBASE-5155_latest.patch, 
> hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdmi

[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185748#comment-13185748
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

Yes Ted. Some of that part was refactored by some defect by Ming.

Also in trunk as HBASE-4083 is there we have disablingTables and also 
enablingTables.  So for trunk may be we may have to apply the changes 
considering the code there.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185745#comment-13185745
 ] 

Zhihong Yu commented on HBASE-5155:
---

I tried to port the patch to trunk.
It turns out that AssignmentManager.java is quite different between 0.90 and 
trunk.
e.g. the following code in rebuildUserRegions() of 0.90:
{code}
Set disablingTables = new HashSet(1);
{code}
But in trunk, disablingTables is a field in AssignmentManager

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185723#comment-13185723
 ] 

Zhihong Yu commented on HBASE-5155:
---

+1.

Minor comments:
{code}
+if (true == checkIfRegionBelongsToDisabled(regionInfo)) {
+  disabled = true;
+}
{code}
Can the above be written as:
{code}
  disabled = checkIfRegionBelongsToDisabled(regionInfo);
{code}
{code}
+// need to enable the table if not disable or disabling
{code}
Should read 'not disabled ...'


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-13 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185634#comment-13185634
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

Pls provide your comments.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185201#comment-13185201
 ] 

Zhihong Yu commented on HBASE-5155:
---

Test suite passed based on Ram's patch.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.90.6
>
> Attachments: HBASE-5155_latest.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-12 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185114#comment-13185114
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

@Ted
Thanks for your review. I will address all the comments. I will do some more 
testing tomorrow and then submit an updated patch.
Currently i can verify in 0.90.  Trunk will be doing sometime later next week.


> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Attachments: HBASE-5155_latest.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185095#comment-13185095
 ] 

Zhihong Yu commented on HBASE-5155:
---

In AssignmentManager.java, setEnabledTable():
{code}
+  LOG.error("Unable to ensure that the table will be"
+  + " enabled because of a ZooKeeper issue");
{code}
Please include tableName in the log.

In bulkAssignUserRegions():
{code}
+List regionsList = java.util.Arrays.asList(regions);
+for (HRegionInfo regionInfo : regionsList) {
{code}
Can we directly iterate over regions array ?

In ZKTable.java:
{code}
-  if (!isEnabledOrDisablingTable(tableName)) {
+  if (isEnabledOrDisablingTable(tableName)) {
 LOG.warn("Moving table " + tableName + " state to disabling but was " +
   "not first in enabled state: " + this.cache.get(tableName));
{code}
Why was the above change necessary ? Now the warning doesn't match the check.

I see some long line:
{code}
  TEST_UTIL.createTable(TABLENAME, FAMILYNAME);
+ 
assertTrue(m.assignmentManager.getZKTable().isEnabledTable(Bytes.toString(TABLENAME)));
{code}

Overall, this patch looks very good.
Thanks for plugging a hole w.r.t. cache in ZkTable.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Attachments: HBASE-5155_latest.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-12 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185030#comment-13185030
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

TestRollingRestart is passing.
I have tried to handle the different scenarios. TestMasterFailOver related 
scenarios also handled.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Attachments: HBASE-5155_latest.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-12 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184844#comment-13184844
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

@Stack
I am afraid will this cause any compatability issues? Because now we try to 
create the enabled node and enabled state.  If the master fails over we still 
go with the node presence in zk to form the zkTable.cache.
So if we have a rolling restart scenario then this will be a problem right ?  
Previously the table node will not be present for the Enabled state but now we 
will create it.
:(

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184164#comment-13184164
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

I could not upload the patch today as still some test case is failing.  Will 
upload it tomorrow.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183898#comment-13183898
 ] 

stack commented on HBASE-5155:
--

Ugh, I wrote a comment and lost it.

So, when you say above that '-> The tables and its regions are deleted 
including R1, D1 and D2.. (So META is cleaned)', you are saying that this 
happens AFTER SSH has scanned .META. and that we're doing the region processing 
AFTER the deletes (I was going to say its odd that we fixup a daughter when 
parent is missing but checking .META. and filesystem before each daughter fixup 
would still have a hole during which a delete could come in)

On keeping a TableState.ENABLED up in zk, that could work (Can't remember why 
didn't do it that way originally -- only thought is that was trying to save on 
the state kept up in zk which is a pretty pathetic reason).  You'll need to add 
an AM.isEnabledTable method to match the isDisabledTable, etc., stuff that is 
already there.  Good stuff Ram.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-10 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183786#comment-13183786
 ] 

chunhui shen commented on HBASE-5155:
-

I think using TableState.ENABLED is helpful and HMaster.TableDescriptors has 
similar function.

But in this issue should we consider the situation of the deleted table is 
created again?
Maybe SSH could differentiate the above situation.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183592#comment-13183592
 ] 

Zhihong Yu commented on HBASE-5155:
---

+1 on utilizing TableState.ENABLED
Nice finding, Ram.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-10 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183205#comment-13183205
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

@Stack
After analysing the code found one thing. May be avoiding SSH and 
DisableTableHandler and DeleteTableHandler parallely is a bigger discussion. 
But the above problem can be solved. 
In SSH 
{code}
  public static boolean processDeadRegion(HRegionInfo hri, Result result,
  AssignmentManager assignmentManager, CatalogTracker catalogTracker)
  throws IOException {
// If table is not disabled but the region is offlined,
boolean disabled = assignmentManager.getZKTable().isDisabledTable(
hri.getTableDesc().getNameAsString());
{code}
we check if the table is disabled.  But if you look at the above logs it is the 
DeleteTableHandler that has already deleted the region and also removed the 
cache from ZkTable.
{code}
am.getZKTable().setEnabledTable(Bytes.toString(tableName));
{code}
Currently setEnabledTable means removing the entry from the map.  So we do not 
have a differentiation between enabled table and delete the table because both 
places we remove from the cache map.

So can we  use the unused TableState.ENABLED in case of enable table handler 
and only delete table handler will remove it.
This will ensure that in SSH.processDeadRegion() we can first check if the 
table is not present in the map and then proceed. If not present we can ensure 
that the table is already deleted.  
Pls give your opinion.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-09 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183122#comment-13183122
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

{code}
2012-01-10 11:43:34,303 INFO org.apache.hadoop.hbase.master.ServerManager: 
Received REGION_SPLIT: j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.: 
Daughters; j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., 
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from 
linux-129,60020,1326175677339




2012-01-10 12:05:19,122 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for linux-129,60020,1326175677339
2012-01-10 12:06:07,153 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
running balancer because processing dead regionserver(s): 
[linux-129,60020,1326175677339]
2012-01-10 12:09:57,865 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 7 
region(s) that linux-129,60020,1326175677339 was carrying (skipping 0 
regions(s) that are already in transition)




2012-01-10 12:11:30,988 INFO 
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to 
disable table j9t6
2012-01-10 12:12:21,513 INFO 
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is 
done=true





2012-01-10 12:13:41,624 INFO 
org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table 
operation C_M_DELETE_TABLE on table j9t6
2012-01-10 12:14:00,811 DEBUG 
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region 
j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META and FS
2012-01-10 12:14:02,230 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META
2012-01-10 12:14:07,330 DEBUG 
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region 
j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META and FS
2012-01-10 12:14:07,521 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META
2012-01-10 12:14:09,860 DEBUG 
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region 
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META 
and FS
2012-01-10 12:14:10,096 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted region 
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META








2012-01-10 12:18:11,081 DEBUG 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and 
split region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.; checking 
daughter presence
2012-01-10 12:18:46,450 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7.
2012-01-10 12:18:46,775 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 
daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. in region 
.META.,,1, serverInfo=null
2012-01-10 12:18:47,135 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x134c5dbd0a6 Creating (or updating) unassigned node for 
49c3665a4bc656f3f6473659b64798f7 with OFFLINE state
2012-01-10 12:18:47,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. so generated a random 
one; hri=j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., src=, 
dest=linux146,60020,1326169560093; 1 (online=1, exclude=null) available servers
2012-01-10 12:18:47,143 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. to 
linux146,60020,1326169560093
2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handle region called from node nodeDataChanged
2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, 
region=49c3665a4bc656f3f6473659b64798f7
2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handle region called from node nodeDataChanged
2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, 
region=49c3665a4bc656f3f6473659b64798f7
2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handle region called from node nodeDataChanged
2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093, 
region=49c3665a4bc656f3f6473659b64798f7
2012-01-10 12:18:47,222 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event fo

[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182932#comment-13182932
 ] 

stack commented on HBASE-5155:
--

@Ram Nice one.  Do you have a snippet of log that shows this?

So, ServerShutdownHandler should be checking if table is disabled before it 
does either fixup or assign?  (Thats what the check of (hri.isOffline()...) is 
supposed to be doing only the enable/disable semantic changed so that now when 
a table is disabled, we now set a flag for the table in zk rather than do it 
individually on each region; i.e. offline it).

Or, are you saying the table was completely deleted when servershutdownhandler 
started to run?  If so, then the create of the region should fail; we should 
make sure that if the parent table directory not present, then the we should 
not be able to create region subdirs.  We'd need a mkdir that did not do a 
recursive create (we need newer hadoop/hdfs for this?)

On the question of synchronization between DeleteTableHandler and 
ServerShutdownHandler, yes, we need to have all threads in master coordinate 
around state changes whether the balancer thread, servershutdownhander executor 
thread, incoming splits, etc.  I'd like to put up a harness in which we can 
repro all these race conditions... HBase-3154 helps with this (the test 
included shows how to mock a balance and a server shutdown handler -- would 
need to make them interleave or have them reproduce this issue -- the log would 
help with reproducing the event sequence).

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-09 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182715#comment-13182715
 ] 

Zhihong Yu commented on HBASE-5155:
---

I think Ram's question @ 09/Jan/12 17:23 hints at introducing synchronization 
between DeleteTableHandler and ServerShutdownhandler.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-09 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182653#comment-13182653
 ] 

Zhihong Yu commented on HBASE-5155:
---

Then we need to detect whether the table being deleted/disabled has region on 
the underlying server.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-09 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182648#comment-13182648
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

Can we prevent disable and delete table from happening if ServerShutDownHandler 
is in progress?

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira