[ https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated HIVE-13513: ------------------------------ Status: Patch Available (was: Open) > cleardanglingscratchdir does not work in some version of HDFS > ------------------------------------------------------------- > > Key: HIVE-13513 > URL: https://issues.apache.org/jira/browse/HIVE-13513 > Project: Hive > Issue Type: Bug > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13513.1.patch > > > On some Hadoop version, we keep getting "lease recovery" message at the time > we check for scratchdir by opening for appending: > {code} > Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 > because lease recovery is in progress. Try again later. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131) > {code} > and > {code} > 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963 > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]], > > original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {code} > The reason is not clear. However, if we remove hsync from SessionState, > everything works as expected. Attach patch to remove hsync call for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)