GitHub user maryannxue opened a pull request:
https://github.com/apache/spark/pull/22060
[DO NOT MERGE][TEST ONLY] Add once-policy rule check
## What changes were proposed in this pull request?
Rules like `HandleNullInputsForUDF`
(https://issues.apache.org/jira/browse/SPARK-24891) do not stabilize (can apply
new changes to a plan indefinitely) and can cause problems like SQL cache
mismatching.
Ideally, all rules whether in a once-policy batch or a fixed-point-policy
batch should stabilize after the number of runs specified. Once-policy should
be considered a performance improvement, a assumption that the rule can
stabilize after just one run rather than an assumption that the rule won't be
applied more than once. Those once-policy rules should be able to run fine with
fixed-point policy rule as well.
Currently we already have a check for fixed-point and throws an exception
if maximum number of runs is reached and the plan is still changing. Here, in
this PR, a similar check is added for once-policy and throws an exception if
the plan changes between the first run and the second run of a once-policy rule.
From this test result, we can find out which of the analysis rules break
this check so we can fix them later.
## How was this patch tested?
N/A
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maryannxue/spark once_policy
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22060.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22060
commit 323656872799b8dd636061220f3ed139379c9c79
Author: maryannxue
Date: 2018-08-09T05:20:32Z
Add once-policy batch check
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org