[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/378 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73908802 I added the UpdateTask message aggregation. I also had to rework the PartitionInfo creation to make it work with the concurrent task updates. This requires another review of the code before we can merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73911265 The job that was previously failing is fixed with this change. We should merge this change ASAP, because its kinda impossible right now to seriously use flink 0.9-SNAPSHOT without it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73854842 There is a problem: https://travis-ci.org/apache/flink/jobs/50215407 ``` java.lang.IllegalStateException: Consumer state is FINISHED but was expected to be RUNNING. at org.apache.flink.runtime.deployment.PartialPartitionInfo.createPartitionInfo(PartialPartitionInfo.java:81) at org.apache.flink.runtime.executiongraph.Execution.sendPartitionInfos(Execution.java:581) at org.apache.flink.runtime.executiongraph.Execution.switchToRunning(Execution.java:654) at org.apache.flink.runtime.executiongraph.Execution.access$100(Execution.java:88) at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:336) at akka.dispatch.OnComplete.internal(Future.scala:247) at akka.dispatch.OnComplete.internal(Future.scala:244) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73851481 You're right. At the moment there is no aggregation of messages. I'll add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/378#discussion_r24415175 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java --- @@ -416,13 +426,26 @@ boolean scheduleOrUpdateConsumers(ListListExecutionEdge consumers) throws Ex final ExecutionState consumerState = consumerVertex.getExecutionState(); if (consumerState == CREATED) { - if (state == RUNNING) { - if (!consumerVertex.scheduleForExecution(consumerVertex.getExecutionGraph().getScheduler(), false)) { - success = false; + consumerVertex.cachePartitionInfo(PartialPartitionInfo.fromEdge(edge)); + + future(new CallableBoolean(){ + @Override + public Boolean call() throws Exception { + try { + consumerVertex.scheduleForExecution( + consumerVertex.getExecutionGraph().getScheduler(), false); + } catch (Exception exception) { + fail(new IllegalStateException(Could not schedule consumer + + vertex + consumerVertex, exception)); + } + + return true; } - } - else { - success = false; + }, AkkaUtils.globalExecutionContext()); + + // double check to resolve race conditions + if(consumerVertex.getExecutionState() == RUNNING){ + consumerVertex.sendPartitionInfos(); --- End diff -- Just to verify: the double check send relies on the fact that update messages at the task manager are idempotent, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73708514 Looks good to me. +1 We chatted about batching update task calls. Did you realize a problem with it or can we open an improvement issue for it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73693561 Very nice. I will have a detailed look later. @zentol Can you also test it with the Python API? I think you initially noticed the problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/378 [FLINK-1489] Fixes blocking scheduleOrUpdateConsumers message calls Replaces the blocking calls with futures which in case of an exception let the respective task fail. Furthermore, the PartitionInfos are buffered on the JobManager in case that some of the consumers are not yet scheduled. Once the state of the consumers switched to running, all buffered partition infos are sent to the consumers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixScheduleOrUpdateConsumers Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/378.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #378 commit d17f15ac966d59444aed86ed7d1c9cc1a2716152 Author: Till Rohrmann trohrm...@apache.org Date: 2015-02-06T14:13:28Z [FLINK-1489] Replaces blocking scheduleOrUpdateConsumers message calls with asynchronous futures. Buffers PartitionInfos at the JobManager in case that the respective consumer has not been scheduled. Conflicts: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---