Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Reynold Xin
I just looked at the PR. I think there are some follow up work that needs to be done, e.g. we shouldn't create a top level packageĀ  org.apache.spark.sql.dynamicpruning. On Wed, Oct 02, 2019 at 1:52 PM, Maryann Xue < maryann@databricks.com > wrote: > > There is no internal write up, but I

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Maryann Xue
There is no internal write up, but I think we should at least give some up-to-date description on that JIRA entry. On Wed, Oct 2, 2019 at 3:13 PM Reynold Xin wrote: > No there is no separate write up internally. > > On Wed, Oct 2, 2019 at 12:29 PM Ryan Blue wrote: > >> Thanks for the pointers,

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Reynold Xin
No there is no separate write up internally. On Wed, Oct 2, 2019 at 12:29 PM Ryan Blue wrote: > Thanks for the pointers, but what I'm looking for is information about the > design of this implementation, like what requires this to be in spark-sql > instead of spark-catalyst. > > Even a

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Maryann Xue
The reason why it's in spark-sql is simply because HadoopFsRelation which the rule tries to match is in spark-sql. We should probably update the high-level description in the JIRA. I'll work on that shortly. On Wed, Oct 2, 2019 at 2:29 PM Ryan Blue wrote: > Thanks for the pointers, but what

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Ryan Blue
Thanks for the pointers, but what I'm looking for is information about the design of this implementation, like what requires this to be in spark-sql instead of spark-catalyst. Even a high-level description, like what the optimizer rules are and what they do would be great. Was there one written

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Maryann Xue
> It lists 3 cases for how a filter is built, but nothing about the overall approach or design that helps when trying to find out where it should be placed in the optimizer rules. The overall idea/design of DPP can be simply put as using the result of one side of the join to prune partitions of a

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Reynold Xin
Whoever created the JIRA years ago didn't describe dpp correctly, but the linked jira in Hive was correct (which unfortunately is much more terse than any of the patches we have in SparkĀ  https://issues.apache.org/jira/browse/HIVE-9152 ). Henry R's description was also correct. On Wed, Oct 02,

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Ryan Blue
Where can I find a design doc for dynamic partition pruning that explains how it works? The JIRA issue, SPARK-11150, doesn't seem to describe dynamic partition pruning (as pointed out by Henry R.) and doesn't have any comments about the implementation's approach. And the PR description also

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Wenchen Fan
dynamic partition pruning rule generates "hidden" filters that will be converted to real predicates at runtime, so it doesn't matter where we run the rule. For PruneFileSourcePartitions, I'm not quite sure. Seems to me it's better to run it before join reorder. On Sun, Sep 29, 2019 at 5:51 AM

[DISCUSS] Out of order optimizer rules?

2019-09-28 Thread Ryan Blue
Hi everyone, I have been working on a PR that moves filter and projection pushdown into the optimizer for DSv2, instead of when converting to physical plan. This will make DSv2 work with optimizer rules that depend on stats, like join reordering. While adding the optimizer rule, I found that