[jira] [Resolved] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2019-03-03 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-14223.

Resolution: Duplicate

Resolve as duplicate.

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Major
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, 
> hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, 
> hbase-14223_v3-master.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> 

[jira] [Resolved] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2019-01-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-14223.

Resolution: Fixed

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, 
> hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, 
> hbase-14223_v3-master.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed