Henry Robinson created SPARK-24254:
--------------------------------------

             Summary: Eagerly evaluate some subqueries over LocalRelation
                 Key: SPARK-24254
                 URL: https://issues.apache.org/jira/browse/SPARK-24254
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Henry Robinson


Some queries would benefit from evaluating subqueries over {{LocalRelations}} 
eagerly. For example:

{code}
SELECT t1.part_col FROM t1 JOIN (SELECT max(part_col) m FROM t2) foo WHERE 
t1.part_col = foo.m
{code}

If {{max(part_col)}} could be evaluated during planning, there's an opportunity 
to prune all but at most one partitions from the scan of {{t1}}. 

Similarly, a near-identical query with a non-scalar subquery in the {{WHERE}} 
clause:

{code}
SELECT * FROM t1 WHERE part_col IN (SELECT part_col FROM t2)
{code}

could be partially evaluated to eliminate some partitions, and remove the join 
from the plan. 

Obviously all subqueries over local relations can't be evaluated during 
planning, but certain whitelisted aggregates could be if the input cardinality 
isn't too high. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to