[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-05 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407675#comment-13407675
 ] 

Zhihong Ted Yu commented on HBASE-6329:
---

I don't have other questions.

@Lars H:
Do you have comment ?

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch, HBASE-6329v2.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a228546f6c8a 
> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
> dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
> deploy task for 
> region=writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-05 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407672#comment-13407672
 ] 

chunhui shen commented on HBASE-6329:
-

@ted
{code}-  // add 2nd daughter first (see HBASE-4335)
-  MetaEditor.addDaughter(server.getCatalogTracker(),
-  b.getRegionInfo(), services.getServerName());
-  MetaEditor.addDaughter(server.getCatalogTracker(),
-  a.getRegionInfo(), services.getServerName());
{code}

Master will fixup daughters, so we don't need addDaughter since rs is stopped, 
Or else master maybe already assigned the daughter and opened otherwhere, this 
operation would cause daughters' META data error

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch, HBASE-6329v2.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a228546f6c8a 
> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
> dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
> deploy task for 
> region=writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact yo

[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-05 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407118#comment-13407118
 ] 

Zhihong Ted Yu commented on HBASE-6329:
---

{code}
+  + " because server is stopping=" + this.server.isStopping()
+  + " or stopped=" + this.server.isStopped(), e);
{code}
Logging one of the condition would be cleaner.
{code}
-  // add 2nd daughter first (see HBASE-4335)
-  MetaEditor.addDaughter(server.getCatalogTracker(),
-  b.getRegionInfo(), services.getServerName());
-  MetaEditor.addDaughter(server.getCatalogTracker(),
-  a.getRegionInfo(), services.getServerName());
{code}
Please explain the above change.

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch, HBASE-6329v2.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a228546f6c8a 
> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
> dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
> deploy task for 
> region=writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent 

[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-05 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406906#comment-13406906
 ] 

ramkrishna.s.vasudevan commented on HBASE-6329:
---

{code}
+} while (this.znodeVersion != -1 && !server.isStopped()
+&& !services.isStopping());
{code}
This is good one.

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch, HBASE-6329v2.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a228546f6c8a 
> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
> dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
> deploy task for 
> region=writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406895#comment-13406895
 ] 

Hadoop QA commented on HBASE-6329:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535154/HBASE-6329v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2326//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2326//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2326//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2326//console

This message is automatically generated.

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch, HBASE-6329v2.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a22

[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406860#comment-13406860
 ] 

Hadoop QA commented on HBASE-6329:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535146/HBASE-6329v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
  org.apache.hadoop.hbase.regionserver.wal.TestHLog

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//console

This message is automatically generated.

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.

[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406858#comment-13406858
 ] 

ramkrishna.s.vasudevan commented on HBASE-6329:
---

Nice one.
One question here 
{code}
// Interrupt catalog tracker here in case any regions being opened out in
// handlers are stuck waiting on meta or root.
if (this.catalogTracker != null) this.catalogTracker.stop();
{code}
This does not impact the thread that is trying to write into META thro 
SplitTransaction?

May be we can add one check like if RS already aborting do not call abort/stop. 
 This is because some times in the above case if META writing fails we will get 
a PONR and thro PONR we will call server.abort.  Now already there is an abort 
going on and one more abort will be called. Not sure of the implications if 
both go on at the same time.

> Stop META regionserver when splitting region could cause daughter region 
> assign twice
> -
>
> Key: HBASE-6329
> URL: https://issues.apache.org/jira/browse/HBASE-6329
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6329v1.patch
>
>
> We found this issue in 0.94, first let me describe the case:
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a228546f6c8a 
> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
> dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
> deploy task for 
> region=writetest,JC\xCA\xC8\xCFOH\xCEV\xCC\xC2\xB5\xC2@\xD4,13413807305