[ https://issues.apache.org/jira/browse/HBASE-14729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983036#comment-14983036 ]
Enis Soztutar commented on HBASE-14729: --------------------------------------- We should not do this. Agreed that it is not safe to delete the WAL log directory if it is not empty. Is there a way we can see whether this is due to HBASE-14223? > SplitLogManager does not clean files from WALs folder in case of master > failover > -------------------------------------------------------------------------------- > > Key: HBASE-14729 > URL: https://issues.apache.org/jira/browse/HBASE-14729 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 2.0.0 > Reporter: Samir Ahmic > Assignee: Samir Ahmic > Fix For: 2.0.0 > > Attachments: HBASE-14729.patch > > > While i was testing master failover process on master branch (distributed > cluster setup) i notice following: > 1. List of dead regionservers was increasing every time active master was > restarted. > 2. Number of folders in /hbase/WALs folder was increasing every time active > master was restarted > Here is exception from master logs showing why this is happening: > {code} > 2015-10-30 09:41:49,238 INFO [ProcedureExecutor-3] master.SplitLogManager: > finished splitting (more than or equal to) 0 bytes in 0 log files in > [hdfs://P3cluster/hbase/WALs/hnode1,16000,1446043659224-splitting] in 21ms > 2015-10-30 09:41:49,235 WARN [ProcedureExecutor-2] master.SplitLogManager: > Returning success without actually splitting and deleting all the log files > in path hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting: > [FileStatus{path=hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting/hnode1%2C16000%2C1446046595488.meta.1446046691314.meta; > isDirectory=false; length=39944; replication=3; blocksize=268435456; > modification_time=1446050348104; access_time=1446046691317; owner=hbase; > group=supergroup; permission=rw-r--r--; isSymlink=false}] > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.PathIsNotEmptyDirectoryException): > `/hbase/WALs/hnode1,16000,1446046595488-splitting is non empty': Directory > is not empty > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3524) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3479) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3463) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:751) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:562) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy15.delete(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:490) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.delete(Unknown Source) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279) > at com.sun.proxy.$Proxy17.delete(Unknown Source) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279) > at com.sun.proxy.$Proxy17.delete(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1726) > at > org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:588) > at > org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:584) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:584) > at > org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:297) > at > org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:400) > at > org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:373) > at > org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:295) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:388) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:228) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:72) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:452) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1050) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:841) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:794) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:479) > {code} > I have tracked exception to this line in SplitLogManager#splitLogDistributed > {code} > 297 if (fs.exists(logDir) && !fs.delete(logDir, false)) > {code} > Since we are removing folder we need to delete recursively so this line > shoud be: > {code} > 297 if (fs.exists(logDir) && !fs.delete(logDir, true)) > {code} > This solved issue. I will attach patch after some additional testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)