[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328028#comment-14328028 ] ASF GitHub Bot commented on FLINK-1556: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/422 > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann >Assignee: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327812#comment-14327812 ] ASF GitHub Bot commented on FLINK-1556: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/422#issuecomment-75102524 Looks good, will merge this as well... > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann >Assignee: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327632#comment-14327632 ] ASF GitHub Bot commented on FLINK-1556: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/422 [FLINK-1556] Corrects faulty JobClient behaviour in case of a submission failure Corrects the behaviour of the ```JobClient``` in case of a submission failure. The PR also contains test cases for the job submission. Additionally, reworked how exceptions are transmitted from the ```JobManager``` to the ```JobClient```. They are directly wrapped into a ```akka.actor.Status.Failure``` and send to the ```JobClient```. This PR is based on #419. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixSubmissionExceptions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/422.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #422 commit 8cc604d61d75370972146333c5a016b5fcdddc77 Author: Till Rohrmann Date: 2015-02-19T10:04:56Z [FLINK-1584] [runtime][tests] Fixes TaskManagerFailsITCase by replacing the TestingCluster with a ForkableFlinkMiniCluster commit 8ecca959d2bf96fa8be1961b413f4a2c45cf50e1 Author: Till Rohrmann Date: 2015-02-19T11:44:32Z [FLINK-1556] [runtime] Fails jobs properly in case of a job submission exception Conflicts: flink-runtime/src/test/scala/org/apache/flink/runtime/testingUtils/TestingUtils.scala flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/taskmanager/TaskManagerFailsITCase.scala > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann >Assignee: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324364#comment-14324364 ] Robert Metzger commented on FLINK-1556: --- Also, it seems that these "failearily" jobs are not properly removed from the jobmanager? http://imgur.com/PyuQEfm > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann >Assignee: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324359#comment-14324359 ] Robert Metzger commented on FLINK-1556: --- I think showing the whole stacktrace of the exception is helpful to understand the deployment issue better. > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann >Assignee: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324122#comment-14324122 ] ASF GitHub Bot commented on FLINK-1556: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/406 > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323936#comment-14323936 ] ASF GitHub Bot commented on FLINK-1556: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/406#issuecomment-74633679 Ok, I'll merge it. > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323219#comment-14323219 ] ASF GitHub Bot commented on FLINK-1556: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/406#issuecomment-74565029 Looks good to me! > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception
[ https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322884#comment-14322884 ] ASF GitHub Bot commented on FLINK-1556: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/406 [FLINK-1556] Corrects faulty JobClient behaviour in case of a submission failure If an error occurred during job submission, a ```SubmissionFailure``` is sent to the ```JobClient```. As a reaction, the ```JobClient``` terminated itself and sent the failure to the ```Client```. However, this does not necessarily mean that the job has reached a terminal state, because the failing procedure is executed asynchronously. The ```JobClient``` now waits until it receives a ```JobResult``` message indicating that the job has completed and all resources are properly returned. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink minorFixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/406.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #406 commit 2f32e9c6b87b8e295f792c04306d78fbb858f80d Author: Till Rohrmann Date: 2015-02-16T09:17:21Z [FLINK-1556] [runtime] Corrects faulty JobClient behaviour in case of a submission failure > JobClient does not wait until a job failed completely if submission exception > - > > Key: FLINK-1556 > URL: https://issues.apache.org/jira/browse/FLINK-1556 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann > > If an exception occurs during job submission the {{JobClient}} received a > {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} > terminates itself and returns the error to the {{Client}}. This indicates to > the user that the job has been completely failed which is not necessarily > true. > If the user directly after such a failure submits another job, then it might > be the case that not all slots of the formerly failed job are returned. This > can lead to a {{NoRessourceAvailableException}}. > We can solve this problem by waiting for the completion of the job failure in > the {{JobClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)