[
https://issues.apache.org/jira/browse/FLINK-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380048#comment-16380048
]
ASF GitHub Bot commented on FLINK-8750:
---------------------------------------
Github user NicoK commented on a diff in the pull request:
https://github.com/apache/flink/pull/5588#discussion_r171183475
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGate.java
---
@@ -553,6 +553,12 @@ public void requestPartitions() throws IOException,
InterruptedException {
channelsWithEndOfPartitionEvents.set(currentChannel.getChannelIndex());
if
(channelsWithEndOfPartitionEvents.cardinality() == numberOfInputChannels) {
+ // Because of race condition between:
+ // 1. releasing inputChannelsWithData
lock in this method and reaching this place
+ // 2. empty data notification that
re-enqueues a channel
+ // we can end up with moreAvailable
flag set to true, while we expect no more data.
+ checkState(!moreAvailable ||
!pollNextBufferOrEvent().isPresent());
+ moreAvailable = false;
--- End diff --
While this certainly fixes the
`checkState(!bufferOrEvent.moreAvailable());` in the `UnionInputGate`, it does
not improve the detection of additional data after the `EndOfPartitionEvent`
too much. How about also adding
`checkState(!pollNextBufferOrEvent().isPresent());` here:
```
private Optional<BufferOrEvent> getNextBufferOrEvent(boolean blocking)
throws IOException, InterruptedException {
if (hasReceivedAllEndOfPartitionEvents) {
checkState(!pollNextBufferOrEvent().isPresent());
return Optional.empty();
}
```
In that case, if we ever try to get more data (due to a data notification)
there should be no actual data left and only empty buffers.
> InputGate may contain data after an EndOfPartitionEvent
> -------------------------------------------------------
>
> Key: FLINK-8750
> URL: https://issues.apache.org/jira/browse/FLINK-8750
> Project: Flink
> Issue Type: Sub-task
> Components: Network
> Reporter: Nico Kruber
> Assignee: Piotr Nowojski
> Priority: Blocker
> Fix For: 1.5.0
>
>
> The travis run at https://travis-ci.org/apache/flink/jobs/344425772 indicates
> that there was still some data after an {{EndOfPartitionEvent}} or that
> {{BufferOrEvent#moreAvailable}} contained the wrong value:
> {code}
> testOutputWithoutPk(org.apache.flink.table.runtime.stream.table.JoinITCase)
> Time elapsed: 4.611 sec <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: null
> at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
> at
> org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:173)
> at
> org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94)
> at
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:292)
> at
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:308)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)