[ https://issues.apache.org/jira/browse/FLINK-34443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818396#comment-17818396 ]
Matthias Pohl edited comment on FLINK-34443 at 2/19/24 9:45 AM: ---------------------------------------------------------------- * [https://github.com/apache/flink/actions/runs/7938595181/job/21677803913#step:10:28799] * [https://github.com/apache/flink/actions/runs/7938595184/job/21677788845#step:10:27633] * [https://github.com/apache/flink/actions/runs/7938595184/job/21677813511#step:10:28731] * [https://github.com/apache/flink/actions/runs/7938595184/job/21677790189#step:10:27633] * [https://github.com/apache/flink/actions/runs/7945888022/job/21693407019#step:10:28813] * [https://github.com/apache/flink/actions/runs/7945888201/job/21693403892#step:10:28806] * [https://github.com/apache/flink/actions/runs/7945888201/job/21693426922#step:10:27716] * [https://github.com/apache/flink/actions/runs/7946115091/job/21693601123#step:10:27716] * [https://github.com/apache/flink/actions/runs/7953599167/job/21710055146#step:10:27671] * [https://github.com/apache/flink/actions/runs/7953599343/job/21710039735#step:10:27228] * [https://github.com/apache/flink/actions/runs/7954185052/job/21711406254#step:10:27707] was (Author: mapohl): * [https://github.com/apache/flink/actions/runs/7938595181/job/21677803913#step:10:28799] * [https://github.com/apache/flink/actions/runs/7938595184/job/21677788845#step:10:27633] * [https://github.com/apache/flink/actions/runs/7938595184/job/21677813511#step:10:28731] * [https://github.com/apache/flink/actions/runs/7938595184/job/21677790189#step:10:27633] * > YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed > when deploying job cluster > ------------------------------------------------------------------------------------------------------- > > Key: FLINK-34443 > URL: https://issues.apache.org/jira/browse/FLINK-34443 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Runtime / Coordination, Test > Infrastructure > Affects Versions: 1.19.0, 1.20.0 > Reporter: Matthias Pohl > Priority: Major > Labels: github-actions, test-stability > > https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804 > {code} > Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in > org.apache.flink.yarn.YARNFileReplicationITCase > Error: 03:04:05 03:04:05.067 [ERROR] > org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication > -- Time elapsed: 1.982 s <<< ERROR! > Feb 14 03:04:05 > org.apache.flink.client.deployment.ClusterDeploymentException: Could not > deploy Yarn job cluster. > Feb 14 03:04:05 at > org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566) > Feb 14 03:04:05 at > org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109) > Feb 14 03:04:05 at > org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73) > Feb 14 03:04:05 at > org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303) > Feb 14 03:04:05 at > org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73) > Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 14 03:04:05 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Feb 14 03:04:05 Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could > only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) > running and 2 node(s) are excluded in this operation. > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518) > Feb 14 03:04:05 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > Feb 14 03:04:05 at java.security.AccessController.doPrivileged(Native > Method) > Feb 14 03:04:05 at javax.security.auth.Subject.doAs(Subject.java:422) > Feb 14 03:04:05 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > Feb 14 03:04:05 > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579) > Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1525) > Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1422) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > Feb 14 03:04:05 at com.sun.proxy.$Proxy113.addBlock(Unknown Source) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520) > Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 14 03:04:05 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > Feb 14 03:04:05 at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > Feb 14 03:04:05 at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > Feb 14 03:04:05 at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > Feb 14 03:04:05 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > Feb 14 03:04:05 at com.sun.proxy.$Proxy116.addBlock(Unknown Source) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1082) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1898) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1700) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707) > {code} > This could be a GHA infrastructure issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)