[ 
https://issues.apache.org/jira/browse/FLINK-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967225#comment-16967225
 ] 

Zhu Zhu commented on FLINK-14572:
---------------------------------

>From the root cause, the case fails when trying to get jar files from blob 
>server, but the file does not exist.
This seems not to be related to NG scheduler since the scheduler is not created 
yet.

>From the logics of {{BlobsCleanupITCase#testBlobServerCleanup()}}, a job will 
>only be submitted if the jar uploading succeeded with 
>{{BlobClient.uploadFiles}}. So it's high likely that the uploaded jar was 
>removed unexpectedly before the job is submitted.
But I've no idea why the file can be removed. 
And I cannot produce this problem locally with hundreds of re-runs either.

[~trohrmann][~gjy] Does you have good ideas for it?

Root cause:
07:47:47,787 ERROR org.apache.flink.runtime.dispatcher.StandaloneDispatcher     
 - Failed to submit job 15838a6ef77eb89697d3def42c1a58b0.
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager
       at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
       at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
       at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
       at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
       at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
       at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
       at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set 
up JobManager
       at 
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:152)
       at 
org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
       at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:381)
       at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
       ... 7 more
Caused by: java.lang.Exception: Cannot set up the user code libraries: 
/tmp/junit6489941706970935338/junit9210755917417967354/blobStore-1474738d-89f8-4ab0-88b2-9df867ba4cc1/incoming/temp-00000001
       at 
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:131)
       ... 10 more
Caused by: java.nio.file.NoSuchFileException: 
/tmp/junit6489941706970935338/junit9210755917417967354/blobStore-1474738d-89f8-4ab0-88b2-9df867ba4cc1/incoming/temp-00000001
       at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
       at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
       at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
       at java.nio.file.Files.move(Files.java:1395)
       at 
org.apache.flink.runtime.blob.BlobUtils.moveTempFileToStore(BlobUtils.java:410)
       at 
org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:497)
       at 
org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:444)
       at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java:417)
       at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)
       at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerJob(BlobLibraryCacheManager.java:91)
       at 
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:128)
       ... 10 more

> BlobsCleanupITCase failed in Travis stage core - scheduler_ng
> -------------------------------------------------------------
>
>                 Key: FLINK-14572
>                 URL: https://issues.apache.org/jira/browse/FLINK-14572
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Tests
>    Affects Versions: 1.10.0
>            Reporter: Gary Yao
>            Priority: Critical
>              Labels: scheduler-ng, test-stability
>             Fix For: 1.10.0
>
>
> {noformat}
> java.lang.AssertionError: 
> Expected: is <true>
>      but: was <false>
>       at 
> org.apache.flink.runtime.jobmanager.BlobsCleanupITCase.testBlobServerCleanup(BlobsCleanupITCase.java:220)
>       at 
> org.apache.flink.runtime.jobmanager.BlobsCleanupITCase.testBlobServerCleanupFinishedJob(BlobsCleanupITCase.java:133)
> {noformat}
> https://api.travis-ci.com/v3/job/250445874/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to