[ 
https://issues.apache.org/jira/browse/SPARK-49699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951018#comment-17951018
 ] 

Aparna Garg commented on SPARK-49699:
-------------------------------------

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/49983

> Disable PruneFilters for streaming workloads
> --------------------------------------------
>
>                 Key: SPARK-49699
>                 URL: https://issues.apache.org/jira/browse/SPARK-49699
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, Structured Streaming
>    Affects Versions: 4.0.0
>            Reporter: Nick Young
>            Assignee: Nick Young
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0, 3.5.4
>
>
> PruneFilters replaces the {{null}} / {{false}} filter with an empty relation, 
> which means the subtree of the filter is also lost. The optimization does not 
> care about whichever operator is in the subtree, hence some important 
> operators like stateful operator, watermark node, observe node could be lost.
> The filter could be evaluated to {{null}} / {{false}} selectively among 
> microbatches in various reasons (one simple example is the modification of 
> the query during restart), which means stateful operator might not be 
> available for batch N and be available for batch N + 1. For this case, 
> streaming query will fail as batch N + 1 cannot load the state from batch N, 
> and it's not recoverable in most cases.
> We have to disable the rule for streaming workloads, with the consideration 
> of backward compatibility - we should avoid breaking existing query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to