[ https://issues.apache.org/jira/browse/SPARK-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maryann Xue updated SPARK-25650: -------------------------------- Description: Rules like {{HandleNullInputsForUDF}} (https://issues.apache.org/jira/browse/SPARK-24891) do not stabilize (can apply new changes to a plan indefinitely) and can cause problems like SQL cache mismatching. Ideally, all rules whether in a once-policy batch or a fixed-point-policy batch should stabilize after the number of runs specified. Once-policy should be considered a performance improvement, a assumption that the rule can stabilize after just one run rather than an assumption that the rule won't be applied more than once. Those once-policy rules should be able to run fine with fixed-point policy rule as well. Currently we already have a check for fixed-point and throws an exception if maximum number of runs is reached and the plan is still changing. Here, in this PR, a similar check is added for once-policy and throws an exception if the plan changes between the first run and the second run of a once-policy rule. To reproduce this issue, go to [https://github.com/apache/spark/pull/22060] and apply the changes. was: Rules like {{HandleNullInputsForUDF}} (https://issues.apache.org/jira/browse/SPARK-24891) do not stabilize (can apply new changes to a plan indefinitely) and can cause problems like SQL cache mismatching. Ideally, all rules whether in a once-policy batch or a fixed-point-policy batch should stabilize after the number of runs specified. Once-policy should be considered a performance improvement, a assumption that the rule can stabilize after just one run rather than an assumption that the rule won't be applied more than once. Those once-policy rules should be able to run fine with fixed-point policy rule as well. Currently we already have a check for fixed-point and throws an exception if maximum number of runs is reached and the plan is still changing. Here, in this PR, a similar check is added for once-policy and throws an exception if the plan changes between the first run and the second run of a once-policy rule. > Make analyzer rules used in once-policy idempotent > -------------------------------------------------- > > Key: SPARK-25650 > URL: https://issues.apache.org/jira/browse/SPARK-25650 > Project: Spark > Issue Type: Task > Components: SQL > Affects Versions: 2.3.2 > Reporter: Maryann Xue > Priority: Major > > Rules like {{HandleNullInputsForUDF}} > (https://issues.apache.org/jira/browse/SPARK-24891) do not stabilize (can > apply new changes to a plan indefinitely) and can cause problems like SQL > cache mismatching. > Ideally, all rules whether in a once-policy batch or a fixed-point-policy > batch should stabilize after the number of runs specified. Once-policy should > be considered a performance improvement, a assumption that the rule can > stabilize after just one run rather than an assumption that the rule won't be > applied more than once. Those once-policy rules should be able to run fine > with fixed-point policy rule as well. > Currently we already have a check for fixed-point and throws an exception if > maximum number of runs is reached and the plan is still changing. Here, in > this PR, a similar check is added for once-policy and throws an exception if > the plan changes between the first run and the second run of a once-policy > rule. > To reproduce this issue, go to [https://github.com/apache/spark/pull/22060] > and apply the changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org