[ https://issues.apache.org/jira/browse/SPARK-19915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-19915: ------------------------------------ Assignee: (was: Apache Spark) > Improve join reorder: simplify cost evaluation, postpone column pruning, > exclude cartesian product > -------------------------------------------------------------------------------------------------- > > Key: SPARK-19915 > URL: https://issues.apache.org/jira/browse/SPARK-19915 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.2.0 > Reporter: Zhenhua Wang > > 1. Usually cardinality is more important than size, we can simplify cost > evaluation by using only cardinality. Note that this also enables us to not > care about column pruing during reordering. Because otherwise, project will > influence the output size of intermediate joins. > 2. Do column pruning during reordering is troublesome. Given the first > change, we can do it right after reordering, then logics for adding projects > on intermediate joins can be removed. This makes the code simpler and more > reliable. > 3. Exclude cartesian products in the "memo". This significantly reduces the > search space and memory overhead of memo. Otherwise every combination of > items will exist in the memo. We can find those unjoinable items after > reordering is finished and put them at the end. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org