Jungtaek Lim created SPARK-50046:
------------------------------------
Summary: [Possible bug] Incorrect watermark advancement if
watermark node is lost/pruned
Key: SPARK-50046
URL: https://issues.apache.org/jira/browse/SPARK-50046
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Jungtaek Lim
This does not happen in current optimization rules, but it was mostly a luck
and we were silently dropping CollectMetrics node, hence it'd be ideal to
address the issue in prior.
WatermarkTracker only looks at the physical plan during calculation of the new
watermark value. It determines the watermark node by index, hence we have
various issues when the watermark node is lost on the optimization.
1) watermark advancement is made even there is one node to be dropped (should
be considered as no data from that node)
2) watermark tracker incorrectly update the memory map of the previous value of
watermark node (index is not a stable key)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]