[ https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allan Yang resolved HBASE-14223. -------------------------------- Resolution: Fixed > Meta WALs are not cleared if meta region was closed and RS aborts > ----------------------------------------------------------------- > > Key: HBASE-14223 > URL: https://issues.apache.org/jira/browse/HBASE-14223 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-14223logs, hbase-14223_v0.patch, > hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, > hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, > hbase-14223_v3-master.patch > > > When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. > The last WAL file just sits there in the RS WAL directory. If RS stops > gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for > meta is not cleaned. It is also not split (which is correct) since master > determines that the RS no longer hosts meta at the time of RS abort. > From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} > directories left uncleaned: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls > /apps/hbase/data/WALs > Found 31 items > drwxr-xr-x - hbase hadoop 0 2015-06-05 01:14 > /apps/hbase/data/WALs/hregion-58203265 > drwxr-xr-x - hbase hadoop 0 2015-06-05 07:54 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 09:28 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 10:01 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting > ... > {code} > The directories contain WALs from meta: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > Found 2 items > -rw-r--r-- 3 hbase hadoop 201608 2015-06-05 03:15 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta > -rw-r--r-- 3 hbase hadoop 44420 2015-06-05 04:36 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > The RS hosted the meta region for some time: > {code} > 2015-06-05 03:14:28,692 INFO [PostOpenDeployTasks:1588230740] > zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper > as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285 > ... > 2015-06-05 03:15:17,302 INFO > [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed > hbase:meta,,1.1588230740 > {code} > In between, a WAL is created: > {code} > 2015-06-05 03:15:11,707 INFO > [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: > Rolled WAL > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta > with entries=385, filesize=196.88 KB; new WAL > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > When CM killed the region server later master did not see these WAL files: > {code} > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 > INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] > master.SplitLogManager: started splitting 2 logs in > [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting] > for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285] > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 > INFO [main-EventThread] wal.WALSplitter: Archived processed log > hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436 > to > hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436 > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 > INFO [main-EventThread] wal.WALSplitter: Archived processed log > hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329 > to > hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329 > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,507 > WARN [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] > master.SplitLogManager: returning success without actually splitting and > deleting all the log files in path > hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,508 > INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] > master.SplitLogManager: finished splitting (more than or equal to) 129135000 > bytes in 2 log files in > [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting] > in 4433ms > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)