[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532572#comment-16532572 ] Peter Bacsko commented on OOZIE-2791: - +1 > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-010.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532547#comment-16532547 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 5 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:red}WARNING{color}: the current HEAD has 100 Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [fluent-job/fluent-job-api]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2908 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} {color:red}. There is at least one warning, please check{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/656/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-010.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532418#comment-16532418 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-010.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532404#comment-16532404 ] Andras Piros commented on OOZIE-2791: - [~kmarton] kicked off another Jenkins build. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-010.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531828#comment-16531828 ] Julia Kinga Marton commented on OOZIE-2791: --- The test errors seems unrelated. [~andras.piros] or [~pbacsko] could you please start another pre commit build? > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-010.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531699#comment-16531699 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 5 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [fluent-job/fluent-job-api]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [server]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 2908 .Tests failed: 0 .Tests errors: 488 .The patch failed the following testcases: .Tests failing with errors: testShellScriptHadoopConfDir(org.apache.oozie.action.hadoop.TestShellActionExecutor) testShellScriptError(org.apache.oozie.action.hadoop.TestShellActionExecutor) testSetupMethods(org.apache.oozie.action.hadoop.TestShellActionExecutor) testShellScript(org.apache.oozie.action.hadoop.TestShellActionExecutor) testEnvVar(org.apache.oozie.action.hadoop.TestShellActionExecutor) testShellScriptHadoopConfDirWithNoL4J(org.apache.oozie.action.hadoop.TestShellActionExecutor) testPerlScript(org.apache.oozie.action.hadoop.TestShellActionExecutor) testUpdateSLA(org.apache.oozie.sla.TestSLAService) testEndMissDBConfirm(org.apache.oozie.sla.TestSLAService) testSLAOperations(org.apache.oozie.sla.TestSLAService) testBasicService(org.apache.oozie.sla.TestSLAService) testJobs(org.apache.oozie.servlet.TestV1JobsServlet) testSubmit(org.apache.oozie.servlet.TestV1JobsServlet) testDefaultConfigurationInActionConf(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testParseJobXmlAndConfiguration(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testCannotKillActionWhenACLSpecified(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testOutputSubmitOK(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testSubmitLauncherConfigurationOverridesLauncherMapperProperties(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testJobXmlWithOozieLauncher(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testCredentialsSkip(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testACLDefaults_noFalseChange(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testJobSubmissionWithoutYarnKill(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testFilesystemScheme(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testExceptionSubmitException(org.apache.oozie.action.hadoop.TestJavaActionExecutor) testKill(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531568#comment-16531568 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-010.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531175#comment-16531175 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 5 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [fluent-job/fluent-job-api]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2908 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/652/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531035#comment-16531035 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530991#comment-16530991 ] Julia Kinga Marton commented on OOZIE-2791: --- [~pbacsko], I have made the suggested changes. Can you please check the new patch? > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519046#comment-16519046 ] Julia Kinga Marton commented on OOZIE-2791: --- Thank you [~andras.piros], for the +1, [~pbacsko], I have uploaded a new patch with the fix of your comments as well. Could you please take a look at the review? > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515612#comment-16515612 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 5 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2158 .{color:orange}Tests failed at first run:{color} TestCoordActionsKillXCommand#testActionKillCommandActionNumbers .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/627/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515557#comment-16515557 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-008.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503350#comment-16503350 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 1 line(s) longer than 132 characters .{color:green}+1{color} the patch adds/modifies 5 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2151 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/612/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503230#comment-16503230 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, > OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501752#comment-16501752 ] Andras Piros commented on OOZIE-2791: - Thanks for the contribution [~kmarton]! Left some more comments over on RB. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501699#comment-16501699 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 4 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2159 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/606/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501600#comment-16501600 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501566#comment-16501566 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 4 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [server]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 2159 .Tests failed: 2 .Tests errors: 6 .The patch failed the following testcases: testOozieSharelibCLICreate(org.apache.oozie.tools.TestConcurrentCopyFromLocal) testOozieSharelibCLICreateConcurrent(org.apache.oozie.tools.TestConcurrentCopyFromLocal) .Tests failing with errors: testConcurrentCopyFromLocal(org.apache.oozie.tools.TestConcurrentCopyFromLocal) testImportTablesOverflowBatchSize(org.apache.oozie.tools.TestDBLoadDump) testImportToNonExistingTablesSucceedsOnHsqldb(org.apache.oozie.tools.TestDBLoadDump) testImportInvalidDataLeavesTablesEmpty(org.apache.oozie.tools.TestDBLoadDump) testImportToNonEmptyTablesCausesPrematureExit(org.apache.oozie.tools.TestDBLoadDump) testImportedDBIsExportedCorrectly(org.apache.oozie.tools.TestDBLoadDump) .{color:orange}Tests failed at first run:{color} TestCoordActionsKillXCommand#testActionKillCommandActionNumbers .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/605/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501485#comment-16501485 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500373#comment-16500373 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 4 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [1] new bugs found below threshold in total that must be fixed. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:red}-1{color} There are [1] new bugs found below threshold in [tools] that must be fixed. . You can find the FindBugs diff here (look for the red and orange ones): tools/findbugs-new.html . The most important FindBugs errors are: . Dereferenced at OozieSharelibCLI.java:[line 388]: Possible null pointer dereference in org.apache.oozie.tools.OozieSharelibCLI$ConcurrentCopyFromLocal.copyFolderRecursively(OozieSharelibCLI$CopyTaskConfiguration) due to return value of called method . Known null at OozieSharelibCLI.java:[line 388] . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2159 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/604/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500254#comment-16500254 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-005.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470173#comment-16470173 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [1] new bugs found below threshold in total that must be fixed. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:red}-1{color} There are [1] new bugs found below threshold in [tools] that must be fixed. . You can find the FindBugs diff here (look for the red and orange ones): tools/findbugs-new.html . The most important FindBugs errors are: . Dereferenced at OozieSharelibCLI.java:[line 368]: Possible null pointer dereference in org.apache.oozie.tools.OozieSharelibCLI$ConcurrentCopyFromLocal.copyFolderRecursively(OozieSharelibCLI$CopyTaskConfiguration) due to return value of called method . Known null at OozieSharelibCLI.java:[line 368] . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [server]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 2132 .Tests failed: 1 .Tests errors: 0 .The patch failed the following testcases: testHiveAction(org.apache.oozie.action.hadoop.TestHiveActionExecutor) .Tests failing with errors: .{color:orange}Tests failed at first run:{color} TestJavaActionExecutor#testCredentialsSkip TestCoordActionsKillXCommand#testActionKillCommandActionNumbers TestOozieDBCLI#testOozieDBCLI .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/522/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470040#comment-16470040 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, > OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462318#comment-16462318 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [3] new bugs found below threshold in total that must be fixed. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:red}-1{color} There are [3] new bugs found below threshold in [tools] that must be fixed. . You can find the FindBugs diff here (look for the red and orange ones): tools/findbugs-new.html . The most important FindBugs errors are: . At OozieSharelibCLI.java:[line 193]: Boxing/unboxing to parse a primitive org.apache.oozie.tools.OozieSharelibCLI.run(String[]) . Possible null pointer dereference in org.apache.oozie.tools.OozieSharelibCLI.copyFolderRecursively(OozieSharelibCLI$CopyTaskConfiguration) due to return value of called method: Another occurrence at OozieSharelibCLI.java:[line 194] . Known null at OozieSharelibCLI.java:[line 239]: Dereferenced at OozieSharelibCLI.java:[line 239] . At OozieSharelibCLI.java:[lines 361-387]: Should org.apache.oozie.tools.OozieSharelibCLI$CopyTaskConfiguration be a _static_ inner class? . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 2112 .Tests failed: 1 .Tests errors: 2 .The patch failed the following testcases: testCoordActionRecoveryServiceForWaitingRegisterPartition(org.apache.oozie.service.TestRecoveryService) .Tests failing with errors: testConnectionRetryExceptionListener(org.apache.oozie.service.TestJMSAccessorService) testConnectionRetry(org.apache.oozie.service.TestJMSAccessorService) .{color:orange}Tests failed at first run:{color} TestJavaActionExecutor#testCredentialsSkip TestOozieDBCLI#testOozieDBCLI .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/505/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462247#comment-16462247 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462237#comment-16462237 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:red}-1{color} Patch failed to apply to head of branch > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462224#comment-16462224 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:red}-1{color} Patch failed to apply to head of branch > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462215#comment-16462215 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-002.patch, OOZIE-2791-01.patch, > OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456167#comment-16456167 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:red}-1{color} Patch failed to apply to head of branch > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456164#comment-16456164 ] Hadoop QA commented on OOZIE-2791: -- PreCommit-OOZIE-Build started > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Julia Kinga Marton >Priority: Major > Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886699#comment-15886699 ] Robert Kanter commented on OOZIE-2791: -- While I agree that Oozie should do retry to cover for other issues (bad network, etc), it still seems funny to me that a perfectly healthy HDFS can get write failures. Anyway, for choosing a default concurrency level, you could try running the thing on say 3 different sizes of clusters using 1, 2, 3, ... concurrency and timing it. Then see which one is the fastest. I imagine it eventually peaks at some point? Alternatively, we could just pick an reasonable but arbitrary value like 10. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862388#comment-15862388 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [2] new bugs found below threshold in total that must be fixed. .{color:green}+1{color} There are no new bugs found in [server]. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [core]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. .{color:red}-1{color} There are [2] new bugs found below threshold in [tools] that must be fixed. .You can find the FindBugs diff here (look for the red and orange ones): tools/findbugs-new.html .The most important FindBugs errors are: .At OozieSharelibCLI.java:[line 185]: Boxing/unboxing to parse a primitive org.apache.oozie.tools.OozieSharelibCLI.run(String[]) .At OozieSharelibCLI.java:[lines 340-367]: Should org.apache.oozie.tools.OozieSharelibCLI$CopyTaskConfiguration be a _static_ inner class? .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1873 .Tests failed: 5 .Tests errors: 3 .The patch failed the following testcases: . testMain(org.apache.oozie.action.hadoop.TestHiveMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMain) . testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain) . testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI) .Tests failing with errors: . testAddXIncludeFromStream(org.apache.oozie.util.TestXConfiguration) . testAddXIncludeFromReader(org.apache.oozie.util.TestXConfiguration) . testLoadDump(org.apache.oozie.tools.TestDBLoadDump) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3642/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862360#comment-15862360 ] Attila Sasvari commented on OOZIE-2791: --- Thanks for the review [~gezapeti]. You are right. I attached a new patch that addresses it. I am not sure if adding this implementation detail to the documentation is really needed. Setting /recommending a "proper" concurrency number is also tricky, but I will think on it. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861142#comment-15861142 ] Peter Cseh commented on OOZIE-2791: --- Thanks for the patch [~asasvari]! I like the overall approach of collecting the failures and retrying at the end. The {{Set}} is not used as a real set because the class {{CopyTaskConfiguration}} does not implement equals and hashcode. Could you update the CLI documentation to explain this behavior? > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > Attachments: OOZIE-2791-01.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860016#comment-15860016 ] Hadoop QA commented on OOZIE-2791: -- Testing JIRA OOZIE-2791 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [2] new bugs found below threshold in total that must be fixed. .{color:green}+1{color} There are no new bugs found in [server]. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [core]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. .{color:red}-1{color} There are [2] new bugs found below threshold in [tools] that must be fixed. .You can find the FindBugs diff here (look for the red and orange ones): tools/findbugs-new.html .The most important FindBugs errors are: .At OozieSharelibCLI.java:[line 185]: Boxing/unboxing to parse a primitive org.apache.oozie.tools.OozieSharelibCLI.run(String[]) .At OozieSharelibCLI.java:[lines 221-226]: Should org.apache.oozie.tools.OozieSharelibCLI$CopyTaskConfiguration be a _static_ inner class? .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1872 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3635/ > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > Attachments: OOZIE-2791-01.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859789#comment-15859789 ] Attila Sasvari commented on OOZIE-2791: --- [~abhishekbafna] thanks for the additional info. We ran into this issue on a 4-node cluster with Hadoop 2.6 (multiple components were talking to HDFS). When I tried to reproduce the problem with {{-concurrency 150}} on my single-node pseudo Hadoop, I noticed that sharelib was partially installed (exception failures for copy tasks were logged). At the end of the execution there were multiple files with 0 byte size created in HDFS. Now I have a working solution that I tested on Mac with pseudo Hadoop 2.6.0. Briefly, I add information about each failed copy task to a concurrent hash set, and retry uploading missed files using a single thread (with copyFromLocalFile). Before re-uploading files, we wait 1000 ms. If it fails again, we increase delay by the factor of 2, and decrease retry count (it is now hardcoded to be 5 times). > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > Attachments: OOZIE-2791-01.patch > > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at >
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858227#comment-15858227 ] Abhishek Bafna commented on OOZIE-2791: --- I tried the Oozie sharelib installation with {{-concurrency}} option with different number of parallel thread and It was able to install the oozie sharelib. Cluster Information: 3 node cluster, build using virtual box within Mac. The load on the cluster was not much, a bunch of MR jobs were running. Values tried for number of threads: 50, 150, 250, 350, 450. Thanks. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856900#comment-15856900 ] Andrew Wang commented on OOZIE-2791: I took a peek at this class, looks like that's a Java API rather than an HDFS one. My guess though is that {{temp.deleteOnExit}} doesn't work since the directory isn't empty (same for the {{temp.delete()}} before it), so we should handle this with a try/finally if you always want this temp dir to be cleaned up. There's already a call to {{FileUtils.deleteDirectory}} at the bottom that will do a recursive delete that could be moved into a {{finally}} block. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856848#comment-15856848 ] Attila Sasvari commented on OOZIE-2791: --- [~andrew.wang] I did not know about that. Thanks. I've just noticed another thing: if there is an exception during the sharelib installation (e.g. copy task fails), files in the temporary directory (https://github.com/apache/oozie/blob/master/tools/src/main/java/org/apache/oozie/tools/OozieSharelibCLI.java#L137) will not be deleted at the end. {code} Error: Copy task for "/var/folders/9q/f8p_r6gj0wbck49_dc092q_mgp/T/oozie5744006317396681919.dir/share/lib/hive2/hadoop-yarn-common-2.4.0.jar" failed with exception ... $ ls /var/folders/9q/f8p_r6gj0wbck49_dc092q_mgp/T/oozie5744006317396681919.dir/share/lib/hive2/hadoop-yarn-common-2.4.0.jar /var/folders/9q/f8p_r6gj0wbck49_dc092q_mgp/T/oozie5744006317396681919.dir/share/lib/hive2/hadoop-yarn-common-2.4.0.jar {code} > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856798#comment-15856798 ] Andrew Wang commented on OOZIE-2791: I think you can do it without a new FileSystem object since there's a create method that takes a blockSize parameter (FileSystem is also a bit heavyweight): https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1048 > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari >Assignee: Attila Sasvari > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856785#comment-15856785 ] Attila Sasvari commented on OOZIE-2791: --- [~andrew.wang] thanks for the idea. I've just created a POC based on your comment, and now I am evaluating it (some refactoring is needed in OozieSharelibCLI, [copyFolderRecursively | https://github.com/apache/oozie/blob/master/tools/src/main/java/org/apache/oozie/tools/OozieSharelibCLI.java#L255] could be replaced with a new method that creates a new Filesystem object with the proper dfs.block.size before submitting the copy job). > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters
[ https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856642#comment-15856642 ] Andrew Wang commented on OOZIE-2791: One idea I actually really like is to set the dfs.block.size to the size of the file being uploaded. This should prevent HDFS from reserving any excess capacity during the write. > ShareLib installation may fail on busy Hadoop clusters > -- > > Key: OOZIE-2791 > URL: https://issues.apache.org/jira/browse/OOZIE-2791 > Project: Oozie > Issue Type: Bug >Reporter: Attila Sasvari > > On a busy Hadoop cluster it can happen that users cannot install properly > Oozie ShareLib. > Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a > concurrency number set high (to simulate a busy cluster): > {code} > oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib > oozie-sharelib-*.tar.gz -concurrency 150 > {code} > You can see a lot of errors (failed copy tasks) on the output: > {code} > Running 464 copy tasks on 150 threads > Error: Copy task failed with exception > Stack trace for the error was (for debug purposes): > -- > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) > -- > ... > {code} > You can see file is created but it's size is 0. > {code} > -rw-r--r-- 3 asasvari supergroup 0 2017-02-07 10:59 > share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar > {code} > This behaviour is clearly wrong. > In case of such an exception, we should retry copying or rollback changes. We > should also consider throttling HDFS requests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)