[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-05-26 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302417#comment-15302417
 ] 

Daniel Dai commented on HIVE-13513:
---

Also pushed to branch-2.1. Thanks!

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-05-26 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301686#comment-15301686
 ] 

Jesus Camacho Rodriguez commented on HIVE-13513:


[~daijy], could you push to branch-2.1 too? Master is version 2.2.0 now (I 
updated the fix version accordingly). Thanks

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0
>
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-05-25 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300444#comment-15300444
 ] 

Daniel Dai commented on HIVE-13513:
---

I will check it in shortly. Already tested and reviewed.

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-05-25 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299853#comment-15299853
 ] 

Jesus Camacho Rodriguez commented on HIVE-13513:


[~daijy], ready to be pushed to 2.1.0? Thanks

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244379#comment-15244379
 ] 

Thejas M Nair commented on HIVE-13513:
--

+1 


> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-04-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244365#comment-15244365
 ] 

Daniel Dai commented on HIVE-13513:
---

ManagementFactory.getRuntimeMXBean().getName() contains both hostname and pid 
already.

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-04-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244346#comment-15244346
 ] 

Hive QA commented on HIVE-13513:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12799062/HIVE-13513.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9970 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.ql.security.TestAuthorizationPreEventListener.testListener
org.apache.hive.hcatalog.api.repl.commands.TestCommands.org.apache.hive.hcatalog.api.repl.commands.TestCommands
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7620/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7620/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7620/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12799062 - PreCommit-HIVE-TRUNK-Build

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs

[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244149#comment-15244149
 ] 

Thejas M Nair commented on HIVE-13513:
--

Looks good. How about writing the hostname as well ?


> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch, HIVE-13513.2.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-04-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240685#comment-15240685
 ] 

Thejas M Nair commented on HIVE-13513:
--

+1

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13513) cleardanglingscratchdir does not work in some version of HDFS

2016-04-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240668#comment-15240668
 ] 

Daniel Dai commented on HIVE-13513:
---

Note without hsync, the content may not flush to hdfs.

> cleardanglingscratchdir does not work in some version of HDFS
> -
>
> Key: HIVE-13513
> URL: https://issues.apache.org/jira/browse/HIVE-13513
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13513.1.patch
>
>
> On some Hadoop version, we keep getting "lease recovery" message at the time 
> we check for scratchdir by opening for appending:
> {code}
> Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 
> because lease recovery is in progress. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> {code}
> and
> {code}
> 16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]],
>  
> original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {code}
> The reason is not clear. However, if we remove hsync from SessionState, 
> everything works as expected. Attach patch to remove hsync call for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)