[
https://issues.apache.org/jira/browse/SPARK-49699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-49699.
---------------------------------
> Disable PruneFilters for streaming workloads
> --------------------------------------------
>
> Key: SPARK-49699
> URL: https://issues.apache.org/jira/browse/SPARK-49699
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, Structured Streaming
> Affects Versions: 4.0.0
> Reporter: Nick Young
> Assignee: Nick Young
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.4, 4.0.0
>
>
> PruneFilters replaces the {{null}} / {{false}} filter with an empty relation,
> which means the subtree of the filter is also lost. The optimization does not
> care about whichever operator is in the subtree, hence some important
> operators like stateful operator, watermark node, observe node could be lost.
> The filter could be evaluated to {{null}} / {{false}} selectively among
> microbatches in various reasons (one simple example is the modification of
> the query during restart), which means stateful operator might not be
> available for batch N and be available for batch N + 1. For this case,
> streaming query will fail as batch N + 1 cannot load the state from batch N,
> and it's not recoverable in most cases.
> We have to disable the rule for streaming workloads, with the consideration
> of backward compatibility - we should avoid breaking existing query.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]