[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reopened FLINK-1556:
-----------------------------------
      Assignee: Till Rohrmann

The issue hasn't been implemented/tested properly.

I'm reopening it.

I was submitting a job with a wrong directory configured. I got the following 
output:
{code}
./flink run -v -p 152 -c com.github.projectflink.avro.GenerateLineitems 
../../testjob/flink-jobs/target/flink-jobs-0.1-SNAPSHOT.jar -p 144 -o 
hdfs:///user/robert/datasets/tpch100/
16:55:04,748 WARN  org.apache.hadoop.util.NativeCodeLoader                      
 - Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
Found a yarn properties file (.yarn-properties) file, using 
"cloud-31.dima.tu-berlin.de:33806" to connect to the JobManager
Submission of job with ID 11bc00bb2a27105221a4137048e3a763 was unsuccessful, 
because File or directory already exists. Existing files and directories are 
not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite existing 
files and directories..
{code}
But the CliFrontend didn't stop. (I waited for more than a minute).

jstack:

{code}
$ jstack 25475
2015-02-17 16:56:43
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00000000013c9000 nid=0x63dc waiting on 
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Hashed wheel timer #1" daemon prio=10 tid=0x0000000001375000 nid=0x63a9 
waiting on condition [0x00007f3582a96000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.jboss.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:483)
        at 
org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:392)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at java.lang.Thread.run(Thread.java:745)

"New I/O server boss #6" daemon prio=10 tid=0x00007f3584123000 nid=0x63a8 
runnable [0x00007f3582b97000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000070ac52878> (a sun.nio.ch.Util$2)
        - locked <0x000000070ac52868> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070ac52750> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
        at 
org.jboss.netty.channel.socket.nio.NioServerBoss.select(NioServerBoss.java:163)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        at 
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"New I/O worker #5" daemon prio=10 tid=0x00007f3584124800 nid=0x63a7 runnable 
[0x00007f3582c98000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000070ac35e20> (a sun.nio.ch.Util$2)
        - locked <0x000000070ac35e10> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070ac35cf8> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"New I/O worker #4" daemon prio=10 tid=0x00007f35843f0800 nid=0x63a6 runnable 
[0x00007f3582d99000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000070ac25598> (a sun.nio.ch.Util$2)
        - locked <0x000000070ac25588> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070ac25470> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"New I/O boss #3" daemon prio=10 tid=0x00007f3584530000 nid=0x63a5 runnable 
[0x00007f3582e9a000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000070abf0888> (a sun.nio.ch.Util$2)
        - locked <0x000000070abf0878> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070abf0760> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"New I/O worker #2" daemon prio=10 tid=0x00007f3584187000 nid=0x63a4 runnable 
[0x00007f3582f9b000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000070ab400b8> (a sun.nio.ch.Util$2)
        - locked <0x000000070ab400a8> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070ab3ff90> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"New I/O worker #1" daemon prio=10 tid=0x00000000013e1000 nid=0x63a3 runnable 
[0x00007f358309b000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000070ab0d8d0> (a sun.nio.ch.Util$2)
        - locked <0x000000070ab0d868> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070ab0d738> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"flink-akka.remote.default-remote-dispatcher-6" daemon prio=10 
tid=0x00007f358c6ef800 nid=0x63a2 waiting on condition [0x00007f358319d000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000070a54f350> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
        at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"flink-akka.remote.default-remote-dispatcher-5" daemon prio=10 
tid=0x00007f3584184000 nid=0x63a1 waiting on condition [0x00007f358329e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000070a54f350> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
        at 
scala.concurrent.forkjoin.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:2135)
        at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2067)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"flink-akka.actor.default-dispatcher-4" daemon prio=10 tid=0x00000000011bd000 
nid=0x63a0 waiting on condition [0x00007f358339f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000709ac8bd8> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
        at 
scala.concurrent.forkjoin.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:2135)
        at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2067)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"flink-akka.actor.default-dispatcher-3" daemon prio=10 tid=0x00007f358c6b9000 
nid=0x639f waiting on condition [0x00007f35834a0000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000709ac8bd8> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
        at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"flink-scheduler-1" daemon prio=10 tid=0x00007f358c61e800 nid=0x639d 
sleeping[0x00007f3588322000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at akka.actor.LightArrayRevolverScheduler.waitNanos(Scheduler.scala:226)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:405)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
        at java.lang.Thread.run(Thread.java:745)

"Service Thread" daemon prio=10 tid=0x00007f358c07d800 nid=0x6398 runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f358c07b000 nid=0x6397 waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f358c078000 nid=0x6396 waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f358c075800 nid=0x6395 runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f358c04c000 nid=0x6394 in Object.wait() 
[0x00007f35906eb000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000703e84858> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x0000000703e84858> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" daemon prio=10 tid=0x00007f358c04a000 nid=0x6393 in 
Object.wait() [0x00007f35907ec000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000703e84470> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x0000000703e84470> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0000000000fb5800 nid=0x6384 waiting on condition 
[0x00007f35b5f4f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000070b19afb0> (a 
scala.concurrent.impl.Promise$CompletionLatch)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at 
org.apache.flink.runtime.client.JobClient$.submitJobAndWait(JobClient.scala:226)
        at 
org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.scala)
        at org.apache.flink.client.program.Client.run(Client.java:342)
        at org.apache.flink.client.program.Client.run(Client.java:310)
        at org.apache.flink.client.program.Client.run(Client.java:304)
        at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
        at 
com.github.projectflink.avro.GenerateLineitems.main(GenerateLineitems.java:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
        at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
        at org.apache.flink.client.program.Client.run(Client.java:254)
        at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
        at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)

"VM Thread" prio=10 tid=0x00007f358c046000 nid=0x6392 runnable 

{code}


> JobClient does not wait until a job failed completely if submission exception
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-1556
>                 URL: https://issues.apache.org/jira/browse/FLINK-1556
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to