[ https://issues.apache.org/jira/browse/HBASE-23286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012272#comment-17012272 ]
Michael Stack edited comment on HBASE-23286 at 1/9/20 11:57 PM: ---------------------------------------------------------------- [~zghao] does this work for you? I enabled hbase.wal.split.to.hfile by setting it to true. I killed a few servers. The SCP logging shows this for the split log steps... {code} 2020-01-09 22:17:55,346 DEBUG org.apache.hadoop.hbase.master.MasterWalManager: Renamed region directory: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting 2020-01-09 22:17:55,347 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [h5,16020,1578604825302] 2020-01-09 22:17:55,351 INFO org.apache.hadoop.hbase.master.SplitLogManager: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting dir is empty, no logs to split. 2020-01-09 22:17:55,355 INFO org.apache.hadoop.hbase.master.SplitLogManager: Finished splitting (more than or equal to) 0 (0 bytes) in 0 log files in [hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting] in 0ms 2020-01-09 22:17:55,356 DEBUG org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Done splitting WALs pid=123301, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, locked=true; ServerCrashProcedure server=h5,16020,1578604825302, splitWal=true, meta=false {code} The dir had 50 odd WALs in it but after above runs all are gone. Above runs too quickly. No instances of recovered.edits in my fs. Let me look at patch... Hmm... Patch changes RS side of splitter and Region open. Master logging should be same as before? Says zero. Undoing this for now..... was (Author: stack): [~zghao] does this work for you? I enabled hbase.wal.split.to.hfile by setting it to true. I killed a few servers. The SCP logging shows this for the split log steps... {code} 2020-01-09 22:17:55,346 DEBUG org.apache.hadoop.hbase.master.MasterWalManager: Renamed region directory: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting 2020-01-09 22:17:55,347 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [h5,16020,1578604825302] 2020-01-09 22:17:55,351 INFO org.apache.hadoop.hbase.master.SplitLogManager: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting dir is empty, no logs to split. 2020-01-09 22:17:55,355 INFO org.apache.hadoop.hbase.master.SplitLogManager: Finished splitting (more than or equal to) 0 (0 bytes) in 0 log files in [hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting] in 0ms 2020-01-09 22:17:55,356 DEBUG org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Done splitting WALs pid=123301, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, locked=true; ServerCrashProcedure server=h5,16020,1578604825302, splitWal=true, meta=false {code} The dir had 50 odd WALs in it but after above runs all are gone. Above runs too quickly. No instances of recovered.edits in my fs. Let me look at patch... > Improve MTTR: Split WAL to HFile > -------------------------------- > > Key: HBASE-23286 > URL: https://issues.apache.org/jira/browse/HBASE-23286 > Project: HBase > Issue Type: Improvement > Components: MTTR > Affects Versions: 3.0.0, 2.3.0 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Priority: Major > Fix For: 3.0.0, 2.3.0 > > > After HBASE-20724, the compaction event marker is not used anymore when > failover. So our new proposal is split WAL to HFile to imporve MTTR. It has 3 > steps: > # Read WAL and write HFile to region’s column family’s recovered.hfiles > directory. > # Open region. > # Bulkload the recovered.hfiles for every column family. > The design doc was attathed by a google doc. Any suggestions are welcomed. -- This message was sent by Atlassian Jira (v8.3.4#803005)