[jira] [Resolved] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:46:01 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-20939.
----------------------------------
    Resolution: Incomplete

> Do not duplicate user-defined functions while optimizing logical query plans
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-20939
>                 URL: https://issues.apache.org/jira/browse/SPARK-20939
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Lovasoa
>            Priority: Minor
>              Labels: bulk-closed, logical_plan, optimizer
>
> Currently, while optimizing a query plan, spark pushes filters down the query 
> plan tree, so that 
> {code:title=LogicalPlan}
> Join Inner, (a = b)
>     +- Filter UDF(a)
>         +- Relation A
>     +- Relation B
> {code}
> becomes 
> {code:title=Optimized LogicalPlan}
>  Join Inner, (a = b)
>      +- Filter UDF(a)
>          +- Relation A
>      +- Filter UDF(b)
>          +- Relation B
> {code}
> In general, it is a good thing to push down filters as it reduces the number 
> of records that will go through the join.
> However, in the case where the filter is an user-defined function (UDF), we 
> cannot know if the cost of executing the function twice will be higher than 
> the eventual cost of joining more elements or not.
> So I think that the optimizer shouldn't move the user-defined function in the 
> query plan tree. The user will still be able to duplicate the function if he 
> wants to.
> See this question on stackoverflow: 
> https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

Reply via email to