[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408976#comment-13408976
 ] 

stack commented on HBASE-6329:
------------------------------

+1. All changes in patch look reasonable to me (we don't have to rollback split 
if aborting because its dross will be cleaned up when the region is opened 
elsewhere... splitting znodes are ephemeral so should be cleaned up when RS 
aborts).
                
> Stopping META regionserver when splitting region could cause daughter region 
> to be assigned twice
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6329
>                 URL: https://issues.apache.org/jira/browse/HBASE-6329
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 6329v3.txt, HBASE-6329v1.patch, HBASE-6329v2.patch
>
>
> We found this issue in 0.94, first let me describe the caseļ¼š
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with 
> serverName=Server A
> 8.Now, in the .META., daughter's region location is Server A but it is 
> onlined on Server B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
> daughter 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  
> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=null 
> 2012-07-04 13:45:58,988 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x136187d60e34644 Creating (or updating) unassigned node for 
> 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
> 2012-07-04 14:27:05,851 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
>  to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
> region=80f999ea84cb259e20e9a228546f6c8a 
> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
> dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added daughter 
> writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
> deploy task for 
> region=writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
>  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to