[ 
https://issues.apache.org/jira/browse/FLINK-34443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817293#comment-17817293
 ] 

Matthias Pohl commented on FLINK-34443:
---------------------------------------

Maybe related to FLINK-34418

> YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed 
> when deploying job cluster
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-34443
>                 URL: https://issues.apache.org/jira/browse/FLINK-34443
>             Project: Flink
>          Issue Type: Bug
>          Components: Build System / CI, Runtime / Coordination, Test 
> Infrastructure
>    Affects Versions: 1.19.0, 1.20.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: github-actions, test-stability
>
> https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804
> {code}
> Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in 
> org.apache.flink.yarn.YARNFileReplicationITCase
> Error: 03:04:05 03:04:05.067 [ERROR] 
> org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication
>  -- Time elapsed: 1.982 s <<< ERROR!
> Feb 14 03:04:05 
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
> deploy Yarn job cluster.
> Feb 14 03:04:05       at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566)
> Feb 14 03:04:05       at 
> org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109)
> Feb 14 03:04:05       at 
> org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73)
> Feb 14 03:04:05       at 
> org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
> Feb 14 03:04:05       at 
> org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73)
> Feb 14 03:04:05       at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 14 03:04:05       at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Feb 14 03:04:05       at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Feb 14 03:04:05       at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Feb 14 03:04:05       at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Feb 14 03:04:05       at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Feb 14 03:04:05 Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could 
> only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) 
> running and 2 node(s) are excluded in this operation.
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
> Feb 14 03:04:05       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
> Feb 14 03:04:05       at java.security.AccessController.doPrivileged(Native 
> Method)
> Feb 14 03:04:05       at javax.security.auth.Subject.doAs(Subject.java:422)
> Feb 14 03:04:05       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
> Feb 14 03:04:05 
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579)
> Feb 14 03:04:05       at org.apache.hadoop.ipc.Client.call(Client.java:1525)
> Feb 14 03:04:05       at org.apache.hadoop.ipc.Client.call(Client.java:1422)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
> Feb 14 03:04:05       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> Feb 14 03:04:05       at com.sun.proxy.$Proxy113.addBlock(Unknown Source)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520)
> Feb 14 03:04:05       at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 14 03:04:05       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> Feb 14 03:04:05       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> Feb 14 03:04:05       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> Feb 14 03:04:05       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> Feb 14 03:04:05       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> Feb 14 03:04:05       at com.sun.proxy.$Proxy116.addBlock(Unknown Source)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1082)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1898)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1700)
> Feb 14 03:04:05       at 
> org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
> {code}
> This could be a GHA infrastructure issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to