[ https://issues.apache.org/jira/browse/SPARK-49699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951018#comment-17951018 ]
Aparna Garg commented on SPARK-49699: ------------------------------------- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/49983 > Disable PruneFilters for streaming workloads > -------------------------------------------- > > Key: SPARK-49699 > URL: https://issues.apache.org/jira/browse/SPARK-49699 > Project: Spark > Issue Type: Bug > Components: Optimizer, Structured Streaming > Affects Versions: 4.0.0 > Reporter: Nick Young > Assignee: Nick Young > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.4 > > > PruneFilters replaces the {{null}} / {{false}} filter with an empty relation, > which means the subtree of the filter is also lost. The optimization does not > care about whichever operator is in the subtree, hence some important > operators like stateful operator, watermark node, observe node could be lost. > The filter could be evaluated to {{null}} / {{false}} selectively among > microbatches in various reasons (one simple example is the modification of > the query during restart), which means stateful operator might not be > available for batch N and be available for batch N + 1. For this case, > streaming query will fail as batch N + 1 cannot load the state from batch N, > and it's not recoverable in most cases. > We have to disable the rule for streaming workloads, with the consideration > of backward compatibility - we should avoid breaking existing query. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org