[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Metzger reopened FLINK-1556: ----------------------------------- Assignee: Till Rohrmann The issue hasn't been implemented/tested properly. I'm reopening it. I was submitting a job with a wrong directory configured. I got the following output: {code} ./flink run -v -p 152 -c com.github.projectflink.avro.GenerateLineitems ../../testjob/flink-jobs/target/flink-jobs-0.1-SNAPSHOT.jar -p 144 -o hdfs:///user/robert/datasets/tpch100/ 16:55:04,748 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found a yarn properties file (.yarn-properties) file, using "cloud-31.dima.tu-berlin.de:33806" to connect to the JobManager Submission of job with ID 11bc00bb2a27105221a4137048e3a763 was unsuccessful, because File or directory already exists. Existing files and directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite existing files and directories.. {code} But the CliFrontend didn't stop. (I waited for more than a minute). jstack: {code} $ jstack 25475 2015-02-17 16:56:43 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode): "Attach Listener" daemon prio=10 tid=0x00000000013c9000 nid=0x63dc waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Hashed wheel timer #1" daemon prio=10 tid=0x0000000001375000 nid=0x63a9 waiting on condition [0x00007f3582a96000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.jboss.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:483) at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:392) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at java.lang.Thread.run(Thread.java:745) "New I/O server boss #6" daemon prio=10 tid=0x00007f3584123000 nid=0x63a8 runnable [0x00007f3582b97000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000070ac52878> (a sun.nio.ch.Util$2) - locked <0x000000070ac52868> (a java.util.Collections$UnmodifiableSet) - locked <0x000000070ac52750> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102) at org.jboss.netty.channel.socket.nio.NioServerBoss.select(NioServerBoss.java:163) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "New I/O worker #5" daemon prio=10 tid=0x00007f3584124800 nid=0x63a7 runnable [0x00007f3582c98000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000070ac35e20> (a sun.nio.ch.Util$2) - locked <0x000000070ac35e10> (a java.util.Collections$UnmodifiableSet) - locked <0x000000070ac35cf8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "New I/O worker #4" daemon prio=10 tid=0x00007f35843f0800 nid=0x63a6 runnable [0x00007f3582d99000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000070ac25598> (a sun.nio.ch.Util$2) - locked <0x000000070ac25588> (a java.util.Collections$UnmodifiableSet) - locked <0x000000070ac25470> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "New I/O boss #3" daemon prio=10 tid=0x00007f3584530000 nid=0x63a5 runnable [0x00007f3582e9a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000070abf0888> (a sun.nio.ch.Util$2) - locked <0x000000070abf0878> (a java.util.Collections$UnmodifiableSet) - locked <0x000000070abf0760> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "New I/O worker #2" daemon prio=10 tid=0x00007f3584187000 nid=0x63a4 runnable [0x00007f3582f9b000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000070ab400b8> (a sun.nio.ch.Util$2) - locked <0x000000070ab400a8> (a java.util.Collections$UnmodifiableSet) - locked <0x000000070ab3ff90> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "New I/O worker #1" daemon prio=10 tid=0x00000000013e1000 nid=0x63a3 runnable [0x00007f358309b000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000070ab0d8d0> (a sun.nio.ch.Util$2) - locked <0x000000070ab0d868> (a java.util.Collections$UnmodifiableSet) - locked <0x000000070ab0d738> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "flink-akka.remote.default-remote-dispatcher-6" daemon prio=10 tid=0x00007f358c6ef800 nid=0x63a2 waiting on condition [0x00007f358319d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000070a54f350> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool) at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) "flink-akka.remote.default-remote-dispatcher-5" daemon prio=10 tid=0x00007f3584184000 nid=0x63a1 waiting on condition [0x00007f358329e000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000070a54f350> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool) at scala.concurrent.forkjoin.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:2135) at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2067) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) "flink-akka.actor.default-dispatcher-4" daemon prio=10 tid=0x00000000011bd000 nid=0x63a0 waiting on condition [0x00007f358339f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000709ac8bd8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool) at scala.concurrent.forkjoin.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:2135) at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2067) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) "flink-akka.actor.default-dispatcher-3" daemon prio=10 tid=0x00007f358c6b9000 nid=0x639f waiting on condition [0x00007f35834a0000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000709ac8bd8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool) at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) "flink-scheduler-1" daemon prio=10 tid=0x00007f358c61e800 nid=0x639d sleeping[0x00007f3588322000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at akka.actor.LightArrayRevolverScheduler.waitNanos(Scheduler.scala:226) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:405) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) "Service Thread" daemon prio=10 tid=0x00007f358c07d800 nid=0x6398 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=10 tid=0x00007f358c07b000 nid=0x6397 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=10 tid=0x00007f358c078000 nid=0x6396 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00007f358c075800 nid=0x6395 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00007f358c04c000 nid=0x6394 in Object.wait() [0x00007f35906eb000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000703e84858> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x0000000703e84858> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference Handler" daemon prio=10 tid=0x00007f358c04a000 nid=0x6393 in Object.wait() [0x00007f35907ec000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000703e84470> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:503) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked <0x0000000703e84470> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x0000000000fb5800 nid=0x6384 waiting on condition [0x00007f35b5f4f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000070b19afb0> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.flink.runtime.client.JobClient$.submitJobAndWait(JobClient.scala:226) at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:342) at org.apache.flink.client.program.Client.run(Client.java:310) at org.apache.flink.client.program.Client.run(Client.java:304) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at com.github.projectflink.avro.GenerateLineitems.main(GenerateLineitems.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:254) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) "VM Thread" prio=10 tid=0x00007f358c046000 nid=0x6392 runnable {code} > JobClient does not wait until a job failed completely if submission exception > ----------------------------------------------------------------------------- > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug > Reporter: Till Rohrmann > Assignee: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)