[
https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-4511:
-------------------------
Status: Patch Available (was: Open)
> There is data loss when master failovers
> ----------------------------------------
>
> Key: HBASE-4511
> URL: https://issues.apache.org/jira/browse/HBASE-4511
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.92.0
> Reporter: gaojinchao
> Assignee: stack
> Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 4511-v2.txt, 4511.txt,
> org.apache.hadoop.hbase.master.TestMasterFailover-output.rar, sketch.txt
>
>
> It goes like this:
> Master crashed , at the same time RS with meta is crashing, but RS doesn't
> eixt.
> Master startups again and finds all living RS.
> Master verifies the meta failed, because this RS is crashing.
> Master reassigns the meta, but it doesn't split the Hlog.
> So some meta data is loss.
> About the logs of a failover test case fail.
> //It said that we want to kill a RS
> 2011-09-28 19:54:45,694 INFO [Thread-988] regionserver.HRegionServer(1443):
> STOPPED: Killing for unit test
> 2011-09-28 19:54:45,694 INFO [Thread-988] master.TestMasterFailover(1007):
> RS 192.168.2.102,54385,1317264874629 killed
> //Rs didn't crash.
> 2011-09-28 19:54:51,763 INFO [Master:0;192.168.2.102,54557,1317264885720]
> master.HMaster(458): Registering server found up in zk:
> 192.168.2.102,54385,1317264874629
> 2011-09-28 19:54:51,763 INFO [Master:0;192.168.2.102,54557,1317264885720]
> master.ServerManager(232): Registering
> server=192.168.2.102,54385,1317264874629
> 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of
> znode /hbase/unassigned/1028785192 because node does not exist (not an error)
> 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s)
> of data from znode /hbase/root-region-server and set watcher;
> 192.168.2.102,54383,131726487...
> //Meta verification failed and ressigned the meta. So all the regions in the
> meta is loss.
> 2011-09-28 19:54:51,773 INFO [Master:0;192.168.2.102,54557,1317264885720]
> catalog.CatalogTracker(476): Failed verification of .META.,,1 at
> address=192.168.2.102,54385,1317264874629;
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server
> 192.168.2.102,54385,1317264874629 not running, aborting
> 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> catalog.CatalogTracker(316): new .META. server:
> 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s)
> of data from znode /hbase/root-region-server and set watcher;
> 192.168.2.102,54383,131726487...
> 2011-09-28 19:54:52,277 INFO [Master:0;192.168.2.102,54557,1317264885720]
> catalog.CatalogTracker(476): Failed verification of .META.,,1 at
> address=192.168.2.102,54385,1317264874629;
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server
> 192.168.2.102,54385,1317264874629 not running, aborting
> 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> catalog.CatalogTracker(316): new .META. server:
> 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s)
> of data from znode /hbase/root-region-server and set watcher;
> 192.168.2.102,54383,131726487...
> 2011-09-28 19:54:52,782 INFO [Master:0;192.168.2.102,54557,1317264885720]
> catalog.CatalogTracker(476): Failed verification of .META.,,1 at
> address=192.168.2.102,54385,1317264874629;
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server
> 192.168.2.102,54385,1317264874629 not running, aborting
> 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> catalog.CatalogTracker(316): new .META. server:
> 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or
> updating) unassigned node for 1028785192 with OFFLINE state
> 2011-09-28 19:54:52,825 DEBUG [Thread-988-EventThread]
> zookeeper.ZooKeeperWatcher(233): master:54557-0x132b31adbb30005 Received
> ZooKeeper Event, type=NodeCreated, state=SyncConnected,
> path=/hbase/unassigned/1028785192
> //It said that Master clean the cluster.
> 2011-09-28 19:54:52,889 INFO [Master:0;192.168.2.102,54557,1317264885720]
> master.AssignmentManager(383): Clean cluster startup. Assigning userregions
> 2011-09-28 19:54:52,889 DEBUG [Master:0;192.168.2.102,54557,1317264885720]
> zookeeper.ZKAssign(494): master:54557-0x132b31adbb30005 Deleting any existing
> unassigned nodes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira