[ 
https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117978#comment-13117978
 ] 

ramkrishna.s.vasudevan commented on HBASE-4511:
-----------------------------------------------

@Gao
I think generally the split of log will happen only if the RS dies and the RS 
node gets deleted.
If that does not happen then the split logs of that crashing RS may not happen.
May be in actual scenario though the master does not process as dead server by 
splitting the logs it will any way wait for ServerShutDownHandler to process it.

what do you feel Gao?
                
> There is data loss when master failovers
> ----------------------------------------
>
>                 Key: HBASE-4511
>                 URL: https://issues.apache.org/jira/browse/HBASE-4511
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: gaojinchao
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 
> org.apache.hadoop.hbase.master.TestMasterFailover-output.rar
>
>
> It goes like this:
> Master crashed ,  at the same time RS with meta is crashing, but RS doesn't 
> eixt.
> Master startups again and finds all living RS. 
> Master verifies the meta failed,  because this RS is crashing.
> Master reassigns the meta, but it doesn't split the Hlog. 
> So some meta data is loss.
> About the logs of a failover test case fail. 
> //It said that we want to kill a RS
> 2011-09-28 19:54:45,694 INFO  [Thread-988] regionserver.HRegionServer(1443): 
> STOPPED: Killing for unit test
> 2011-09-28 19:54:45,694 INFO  [Thread-988] master.TestMasterFailover(1007): 
> RS 192.168.2.102,54385,1317264874629 killed 
> //Rs didn't crash. 
> 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
> master.HMaster(458): Registering server found up in zk: 
> 192.168.2.102,54385,1317264874629
> 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
> master.ServerManager(232): Registering 
> server=192.168.2.102,54385,1317264874629
> 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of 
> znode /hbase/unassigned/1028785192 because node does not exist (not an error)
> 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
> of data from znode /hbase/root-region-server and set watcher; 
> 192.168.2.102,54383,131726487...
> //Meta verification failed and ressigned the meta. So all the regions in the 
> meta is loss.
> 2011-09-28 19:54:51,773 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
> catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
> address=192.168.2.102,54385,1317264874629; 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
> 192.168.2.102,54385,1317264874629 not running, aborting
> 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> catalog.CatalogTracker(316): new .META. server: 
> 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
> of data from znode /hbase/root-region-server and set watcher; 
> 192.168.2.102,54383,131726487...
> 2011-09-28 19:54:52,277 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
> catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
> address=192.168.2.102,54385,1317264874629; 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
> 192.168.2.102,54385,1317264874629 not running, aborting
> 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> catalog.CatalogTracker(316): new .META. server: 
> 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
> of data from znode /hbase/root-region-server and set watcher; 
> 192.168.2.102,54383,131726487...
> 2011-09-28 19:54:52,782 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
> catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
> address=192.168.2.102,54385,1317264874629; 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
> 192.168.2.102,54385,1317264874629 not running, aborting
> 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> catalog.CatalogTracker(316): new .META. server: 
> 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or 
> updating) unassigned node for 1028785192 with OFFLINE state
> 2011-09-28 19:54:52,825 DEBUG [Thread-988-EventThread] 
> zookeeper.ZooKeeperWatcher(233): master:54557-0x132b31adbb30005 Received 
> ZooKeeper Event, type=NodeCreated, state=SyncConnected, 
> path=/hbase/unassigned/1028785192
> //It said that Master clean the cluster.
> 2011-09-28 19:54:52,889 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
> master.AssignmentManager(383): Clean cluster startup. Assigning userregions
> 2011-09-28 19:54:52,889 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
> zookeeper.ZKAssign(494): master:54557-0x132b31adbb30005 Deleting any existing 
> unassigned nodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to