[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-04 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532572#comment-16532572
 ] 

Peter Bacsko commented on OOZIE-2791:
-

+1

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-010.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532547#comment-16532547
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 5 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:red}WARNING{color}: the current HEAD has 100 Javadoc warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 2908
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}

{color:red}. There is at least one warning, please check{color}

The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/656/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-010.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532418#comment-16532418
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-010.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-04 Thread Andras Piros (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532404#comment-16532404
 ] 

Andras Piros commented on OOZIE-2791:
-

[~kmarton] kicked off another Jenkins build.

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-010.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-03 Thread Julia Kinga Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531828#comment-16531828
 ] 

Julia Kinga Marton commented on OOZIE-2791:
---

The test errors seems unrelated. [~andras.piros] or [~pbacsko] could you please 
start another pre commit build?

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-010.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531699#comment-16531699
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 5 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [server].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 2908
.Tests failed: 0
.Tests errors: 488

.The patch failed the following testcases:



.Tests failing with errors:
testShellScriptHadoopConfDir(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testShellScriptError(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testSetupMethods(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testShellScript(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testEnvVar(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testShellScriptHadoopConfDirWithNoL4J(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testPerlScript(org.apache.oozie.action.hadoop.TestShellActionExecutor)
testUpdateSLA(org.apache.oozie.sla.TestSLAService)
testEndMissDBConfirm(org.apache.oozie.sla.TestSLAService)
testSLAOperations(org.apache.oozie.sla.TestSLAService)
testBasicService(org.apache.oozie.sla.TestSLAService)
testJobs(org.apache.oozie.servlet.TestV1JobsServlet)
testSubmit(org.apache.oozie.servlet.TestV1JobsServlet)
testDefaultConfigurationInActionConf(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testParseJobXmlAndConfiguration(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testCannotKillActionWhenACLSpecified(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testOutputSubmitOK(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testSubmitLauncherConfigurationOverridesLauncherMapperProperties(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testJobXmlWithOozieLauncher(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testCredentialsSkip(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testACLDefaults_noFalseChange(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testJobSubmissionWithoutYarnKill(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testFilesystemScheme(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testExceptionSubmitException(org.apache.oozie.action.hadoop.TestJavaActionExecutor)
testKill(org.apache.oozie.action.hadoop.TestJavaActionExecutor)

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531568#comment-16531568
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-010.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531175#comment-16531175
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 5 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
. {color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 2908
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/652/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531035#comment-16531035
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-07-03 Thread Julia Kinga Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530991#comment-16530991
 ] 

Julia Kinga Marton commented on OOZIE-2791:
---

[~pbacsko], I have made the suggested changes. Can you please check the new 
patch?

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-009.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-21 Thread Julia Kinga Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519046#comment-16519046
 ] 

Julia Kinga Marton commented on OOZIE-2791:
---

Thank you [~andras.piros], for the +1, [~pbacsko], I have uploaded a new patch 
with the fix of your comments as well. Could you please take a look at the 
review?

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515612#comment-16515612
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 5 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [client].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [server].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 2158
.{color:orange}Tests failed at first run:{color}
TestCoordActionsKillXCommand#testActionKillCommandActionNumbers
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/627/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515557#comment-16515557
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-008.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503350#comment-16503350
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 1 line(s) longer than 132 
characters
.{color:green}+1{color} the patch adds/modifies 5 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 2151
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/612/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503230#comment-16503230
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-007.patch, 
> OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-05 Thread Andras Piros (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501752#comment-16501752
 ] 

Andras Piros commented on OOZIE-2791:
-

Thanks for the contribution [~kmarton]! Left some more comments over on RB.

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501699#comment-16501699
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 4 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 2159
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/606/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501600#comment-16501600
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501566#comment-16501566
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 4 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [server].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 2159
.Tests failed: 2
.Tests errors: 6

.The patch failed the following testcases:

testOozieSharelibCLICreate(org.apache.oozie.tools.TestConcurrentCopyFromLocal)
testOozieSharelibCLICreateConcurrent(org.apache.oozie.tools.TestConcurrentCopyFromLocal)

.Tests failing with errors:
testConcurrentCopyFromLocal(org.apache.oozie.tools.TestConcurrentCopyFromLocal)
testImportTablesOverflowBatchSize(org.apache.oozie.tools.TestDBLoadDump)
testImportToNonExistingTablesSucceedsOnHsqldb(org.apache.oozie.tools.TestDBLoadDump)
testImportInvalidDataLeavesTablesEmpty(org.apache.oozie.tools.TestDBLoadDump)
testImportToNonEmptyTablesCausesPrematureExit(org.apache.oozie.tools.TestDBLoadDump)
testImportedDBIsExportedCorrectly(org.apache.oozie.tools.TestDBLoadDump)

.{color:orange}Tests failed at first run:{color}
TestCoordActionsKillXCommand#testActionKillCommandActionNumbers
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/605/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501485#comment-16501485
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-006.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500373#comment-16500373
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 4 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
error(s)
.{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [1] new bugs found below threshold in total that 
must be fixed.
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:red}-1{color} There are [1] new bugs found below threshold in [tools] 
that must be fixed.
. You can find the FindBugs diff here (look for the red and orange ones): 
tools/findbugs-new.html
. The most important FindBugs errors are:
. Dereferenced at OozieSharelibCLI.java:[line 388]: Possible null pointer 
dereference in 
org.apache.oozie.tools.OozieSharelibCLI$ConcurrentCopyFromLocal.copyFolderRecursively(OozieSharelibCLI$CopyTaskConfiguration)
 due to return value of called method
. Known null at OozieSharelibCLI.java:[line 388]
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 2159
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/604/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500254#comment-16500254
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-005.patch, OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470173#comment-16470173
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 2 line(s) longer than 132 
characters
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [1] new bugs found below threshold in total that 
must be fixed.
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:red}-1{color} There are [1] new bugs found below threshold in [tools] 
that must be fixed.
. You can find the FindBugs diff here (look for the red and orange ones): 
tools/findbugs-new.html
. The most important FindBugs errors are:
. Dereferenced at OozieSharelibCLI.java:[line 368]: Possible null pointer 
dereference in 
org.apache.oozie.tools.OozieSharelibCLI$ConcurrentCopyFromLocal.copyFolderRecursively(OozieSharelibCLI$CopyTaskConfiguration)
 due to return value of called method
. Known null at OozieSharelibCLI.java:[line 368]
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [server].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 2132
.Tests failed: 1
.Tests errors: 0

.The patch failed the following testcases:

testHiveAction(org.apache.oozie.action.hadoop.TestHiveActionExecutor)

.Tests failing with errors:


.{color:orange}Tests failed at first run:{color}
TestJavaActionExecutor#testCredentialsSkip
TestCoordActionsKillXCommand#testActionKillCommandActionNumbers
TestOozieDBCLI#testOozieDBCLI
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/522/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470040#comment-16470040
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-004.patch, 
> OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462318#comment-16462318
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [3] new bugs found below threshold in total that 
must be fixed.
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:red}-1{color} There are [3] new bugs found below threshold in [tools] 
that must be fixed.
. You can find the FindBugs diff here (look for the red and orange ones): 
tools/findbugs-new.html
. The most important FindBugs errors are:
. At OozieSharelibCLI.java:[line 193]: Boxing/unboxing to parse a primitive 
org.apache.oozie.tools.OozieSharelibCLI.run(String[])
. Possible null pointer dereference in 
org.apache.oozie.tools.OozieSharelibCLI.copyFolderRecursively(OozieSharelibCLI$CopyTaskConfiguration)
 due to return value of called method: Another occurrence at 
OozieSharelibCLI.java:[line 194]
. Known null at OozieSharelibCLI.java:[line 239]: Dereferenced at 
OozieSharelibCLI.java:[line 239]
. At OozieSharelibCLI.java:[lines 361-387]: Should 
org.apache.oozie.tools.OozieSharelibCLI$CopyTaskConfiguration be a _static_ 
inner class?
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 2112
.Tests failed: 1
.Tests errors: 2

.The patch failed the following testcases:

testCoordActionRecoveryServiceForWaitingRegisterPartition(org.apache.oozie.service.TestRecoveryService)

.Tests failing with errors:
testConnectionRetryExceptionListener(org.apache.oozie.service.TestJMSAccessorService)
testConnectionRetry(org.apache.oozie.service.TestJMSAccessorService)

.{color:orange}Tests failed at first run:{color}
TestJavaActionExecutor#testCredentialsSkip
TestOozieDBCLI#testOozieDBCLI
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/505/



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462247#comment-16462247
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462237#comment-16462237
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462224#comment-16462224
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-003.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462215#comment-16462215
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-002.patch, OOZIE-2791-01.patch, 
> OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456167#comment-16456167
 ] 

Hadoop QA commented on OOZIE-2791:
--


Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2018-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456164#comment-16456164
 ] 

Hadoop QA commented on OOZIE-2791:
--

PreCommit-OOZIE-Build started


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-27 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886699#comment-15886699
 ] 

Robert Kanter commented on OOZIE-2791:
--

While I agree that Oozie should do retry to cover for other issues (bad 
network, etc), it still seems funny to me that a perfectly healthy HDFS can get 
write failures.

Anyway, for choosing a default concurrency level, you could try running the 
thing on say 3 different sizes of clusters using 1, 2, 3, ... concurrency and 
timing it.  Then see which one is the fastest.  I imagine it eventually peaks 
at some point?  Alternatively, we could just pick an reasonable but arbitrary 
value like 10.  

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
> Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862388#comment-15862388
 ] 

Hadoop QA commented on OOZIE-2791:
--

Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [2] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:red}-1{color} There are [2] new bugs found below threshold in 
[tools] that must be fixed.
.You can find the FindBugs diff here (look for the red and orange ones): 
tools/findbugs-new.html
.The most important FindBugs errors are:
.At OozieSharelibCLI.java:[line 185]: Boxing/unboxing to parse a primitive 
org.apache.oozie.tools.OozieSharelibCLI.run(String[])
.At OozieSharelibCLI.java:[lines 340-367]: Should 
org.apache.oozie.tools.OozieSharelibCLI$CopyTaskConfiguration be a _static_ 
inner class?
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1873
.Tests failed: 5
.Tests errors: 3

.The patch failed the following testcases:

.  testMain(org.apache.oozie.action.hadoop.TestHiveMain)
.  testPigScript(org.apache.oozie.action.hadoop.TestPigMain)
.  testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain)
.  testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain)
.  testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI)

.Tests failing with errors:
.  testAddXIncludeFromStream(org.apache.oozie.util.TestXConfiguration)
.  testAddXIncludeFromReader(org.apache.oozie.util.TestXConfiguration)
.  testLoadDump(org.apache.oozie.tools.TestDBLoadDump)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3642/

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
> Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-11 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862360#comment-15862360
 ] 

Attila Sasvari commented on OOZIE-2791:
---

Thanks for the review [~gezapeti]. You are right. I attached a new patch that 
addresses it. I am not sure if adding this implementation detail to the 
documentation is really needed. Setting /recommending a "proper" concurrency 
number is also tricky, but I will think on it.

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
> Attachments: OOZIE-2791-01.patch, OOZIE-2791-02.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-10 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861142#comment-15861142
 ] 

Peter Cseh commented on OOZIE-2791:
---

Thanks for the patch [~asasvari]!

I like the overall approach of collecting the failures and retrying at the end.
The {{Set}} is not used as a real set because the class 
{{CopyTaskConfiguration}} does not implement equals and hashcode.
Could you update the CLI documentation to explain this behavior?



> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
> Attachments: OOZIE-2791-01.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860016#comment-15860016
 ] 

Hadoop QA commented on OOZIE-2791:
--

Testing JIRA OOZIE-2791

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [2] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:red}-1{color} There are [2] new bugs found below threshold in 
[tools] that must be fixed.
.You can find the FindBugs diff here (look for the red and orange ones): 
tools/findbugs-new.html
.The most important FindBugs errors are:
.At OozieSharelibCLI.java:[line 185]: Boxing/unboxing to parse a primitive 
org.apache.oozie.tools.OozieSharelibCLI.run(String[])
.At OozieSharelibCLI.java:[lines 221-226]: Should 
org.apache.oozie.tools.OozieSharelibCLI$CopyTaskConfiguration be a _static_ 
inner class?
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1872
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3635/

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
> Attachments: OOZIE-2791-01.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-09 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859789#comment-15859789
 ] 

Attila Sasvari commented on OOZIE-2791:
---

[~abhishekbafna] thanks for the additional info. We ran into this issue on a 
4-node cluster with Hadoop 2.6 (multiple components were talking to HDFS).

When I tried to reproduce the problem with {{-concurrency 150}} on my 
single-node pseudo Hadoop, I noticed that sharelib was partially installed 
(exception failures for copy tasks were logged). At the end of the execution 
there were multiple files with 0 byte size created in HDFS.

Now I have a working solution that I tested on Mac with pseudo Hadoop 2.6.0. 
Briefly, I add information about each failed copy task to a concurrent hash 
set, and retry uploading missed files using a single thread (with 
copyFromLocalFile). Before re-uploading files, we wait 1000 ms. If it fails 
again, we increase delay by the factor of 2, and decrease retry count (it is 
now hardcoded to be 5 times).  

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
> Attachments: OOZIE-2791-01.patch
>
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-08 Thread Abhishek Bafna (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858227#comment-15858227
 ] 

Abhishek Bafna commented on OOZIE-2791:
---

I tried the Oozie sharelib installation with {{-concurrency}} option with 
different number of parallel thread and It was able to install the oozie 
sharelib.

Cluster Information: 3 node cluster, build using virtual box within Mac.
The load on the cluster was not much, a bunch of MR jobs were running.
Values tried for number of threads: 50, 150, 250, 350, 450.

Thanks.

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856900#comment-15856900
 ] 

Andrew Wang commented on OOZIE-2791:


I took a peek at this class, looks like that's a Java API rather than an HDFS 
one. My guess though is that {{temp.deleteOnExit}} doesn't work since the 
directory isn't empty (same for the {{temp.delete()}} before it), so we should 
handle this with a try/finally if you always want this temp dir to be cleaned 
up. There's already a call to {{FileUtils.deleteDirectory}} at the bottom that 
will do a recursive delete that could be moved into a {{finally}} block.

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-07 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856848#comment-15856848
 ] 

Attila Sasvari commented on OOZIE-2791:
---

[~andrew.wang] I did not know about that. Thanks. 

I've just noticed another thing: if there is an exception during the sharelib 
installation (e.g. copy task fails), files in the temporary directory 
(https://github.com/apache/oozie/blob/master/tools/src/main/java/org/apache/oozie/tools/OozieSharelibCLI.java#L137)
 will not be deleted at the end. 

{code}
Error: Copy task for 
"/var/folders/9q/f8p_r6gj0wbck49_dc092q_mgp/T/oozie5744006317396681919.dir/share/lib/hive2/hadoop-yarn-common-2.4.0.jar"
 failed with exception
...

$ ls 
/var/folders/9q/f8p_r6gj0wbck49_dc092q_mgp/T/oozie5744006317396681919.dir/share/lib/hive2/hadoop-yarn-common-2.4.0.jar
 
/var/folders/9q/f8p_r6gj0wbck49_dc092q_mgp/T/oozie5744006317396681919.dir/share/lib/hive2/hadoop-yarn-common-2.4.0.jar
{code}


> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's 

[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856798#comment-15856798
 ] 

Andrew Wang commented on OOZIE-2791:


I think you can do it without a new FileSystem object since there's a create 
method that takes a blockSize parameter (FileSystem is also a bit heavyweight):

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1048

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-07 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856785#comment-15856785
 ] 

Attila Sasvari commented on OOZIE-2791:
---

[~andrew.wang] thanks for the idea. I've just created a POC based on your 
comment, and now I am evaluating it (some refactoring is needed in 
OozieSharelibCLI, [copyFolderRecursively | 
https://github.com/apache/oozie/blob/master/tools/src/main/java/org/apache/oozie/tools/OozieSharelibCLI.java#L255]
 could be replaced with a new method that creates a new Filesystem object with 
the proper dfs.block.size before submitting the copy job).  

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2791) ShareLib installation may fail on busy Hadoop clusters

2017-02-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856642#comment-15856642
 ] 

Andrew Wang commented on OOZIE-2791:


One idea I actually really like is to set the dfs.block.size to the size of the 
file being uploaded. This should prevent HDFS from reserving any excess 
capacity during the write.

> ShareLib installation may fail on busy Hadoop clusters
> --
>
> Key: OOZIE-2791
> URL: https://issues.apache.org/jira/browse/OOZIE-2791
> Project: Oozie
>  Issue Type: Bug
>Reporter: Attila Sasvari
>
> On a busy Hadoop cluster it can happen that users cannot install properly  
> Oozie ShareLib.
> Example on a Hadoop 2.4.0 pseudo cluster sharelib installion with a  
> concurrency number set high (to simulate a busy cluster):
> {code}
> oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib 
> oozie-sharelib-*.tar.gz -concurrency 150
> {code}
> You can see a lot of errors (failed copy tasks) on the output:
> {code}
> Running 464 copy tasks on 150 threads
> Error: Copy task failed with exception
> Stack trace for the error was (for debug purposes):
> --
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/asasvari/share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar 
> could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> --
> ...
> {code}
> You can see file is created but it's size is 0.
> {code}
> -rw-r--r--   3 asasvari supergroup  0 2017-02-07 10:59 
> share/lib/lib_20170207105926/distcp/hadoop-distcp-2.4.0.jar
> {code}
> This behaviour is clearly wrong. 
> In case of such an exception, we should retry copying or rollback changes. We 
> should also consider throttling HDFS requests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)