Github user NicoK commented on a diff in the pull request:
https://github.com/apache/flink/pull/4559#discussion_r157706995
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedSubpartition.java
---
@@ -52,6 +54,10 @@
/** Flag indicating whether the subpartition has been released. */
private volatile boolean isReleased;
+ /** The number of non-event buffers currently in this subpartition */
+ @GuardedBy("buffers")
+ private volatile int buffersInBacklog;
--- End diff --
Your absolutely right about not counting events . Therefore, we cannot use
the queue's size as I suggested.
Yes, `BufferAndAvailability` would need to be extended as well.
This integration/split of the spillable/spilled subpartitions and
subpartition views and both of them working on the same structures requiring
the same synchronisation pattern is imho really not nice and highly fragile.
@pnowojski and me are currently re-designing the synchronisation in these parts
of the code and are a bit sensitive to it now so let's drag him into this
discussion as well: I would consider `PipelinedSubpartition` the hot path where
we need to optimise most - spillable subpartitions are used in batch mode and
have higher tolerances, especially when spilling to disk. if you returned the
new backlog counter in `SpillableSubpartition#decreaseBuffersInBacklog()`
however (retrieved under the `synchronized (buffers)` section), then you would
not need the `volatile` either since you are already under the lock.
---