[jira] [Updated] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20939:
-
Labels: bulk-closed logical_plan optimizer  (was: logical_plan optimizer)

> Do not duplicate user-defined functions while optimizing logical query plans
> 
>
> Key: SPARK-20939
> URL: https://issues.apache.org/jira/browse/SPARK-20939
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Lovasoa
>Priority: Minor
>  Labels: bulk-closed, logical_plan, optimizer
>
> Currently, while optimizing a query plan, spark pushes filters down the query 
> plan tree, so that 
> {code:title=LogicalPlan}
> Join Inner, (a = b)
> +- Filter UDF(a)
> +- Relation A
> +- Relation B
> {code}
> becomes 
> {code:title=Optimized LogicalPlan}
>  Join Inner, (a = b)
>  +- Filter UDF(a)
>  +- Relation A
>  +- Filter UDF(b)
>  +- Relation B
> {code}
> In general, it is a good thing to push down filters as it reduces the number 
> of records that will go through the join.
> However, in the case where the filter is an user-defined function (UDF), we 
> cannot know if the cost of executing the function twice will be higher than 
> the eventual cost of joining more elements or not.
> So I think that the optimizer shouldn't move the user-defined function in the 
> query plan tree. The user will still be able to duplicate the function if he 
> wants to.
> See this question on stackoverflow: 
> https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

2017-06-02 Thread Takeshi Yamamuro (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-20939:
-
Issue Type: Improvement  (was: Bug)

> Do not duplicate user-defined functions while optimizing logical query plans
> 
>
> Key: SPARK-20939
> URL: https://issues.apache.org/jira/browse/SPARK-20939
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Lovasoa
>Priority: Minor
>  Labels: logical_plan, optimizer
>
> Currently, while optimizing a query plan, spark pushes filters down the query 
> plan tree, so that 
> {code:title=LogicalPlan}
> Join Inner, (a = b)
> +- Filter UDF(a)
> +- Relation A
> +- Relation B
> {code}
> becomes 
> {code:title=Optimized LogicalPlan}
>  Join Inner, (a = b)
>  +- Filter UDF(a)
>  +- Relation A
>  +- Filter UDF(b)
>  +- Relation B
> {code}
> In general, it is a good thing to push down filters as it reduces the number 
> of records that will go through the join.
> However, in the case where the filter is an user-defined function (UDF), we 
> cannot know if the cost of executing the function twice will be higher than 
> the eventual cost of joining more elements or not.
> So I think that the optimizer shouldn't move the user-defined function in the 
> query plan tree. The user will still be able to duplicate the function if he 
> wants to.
> See this question on stackoverflow: 
> https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

2017-05-31 Thread Lovasoa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lovasoa updated SPARK-20939:

Description: 
Currently, while optimizing a query plan, spark pushes filters down the query 
plan tree, so that 

{code:title=LogicalPlan}
Join Inner, (a = b)
+- Filter UDF(a)
+- Relation A
+- Relation B
{code}

becomes 

{code:title=Optimized LogicalPlan}
 Join Inner, (a = b)
 +- Filter UDF(a)
 +- Relation A
 +- Filter UDF(b)
 +- Relation B
{code}

In general, it is a good thing to push down filters as it reduces the number of 
records that will go through the join.

However, in the case where the filter is an user-defined function (UDF), we 
cannot know if the cost of executing the function twice will be higher than the 
eventual cost of joining more elements or not.

So I think that the optimizer shouldn't move the user-defined function in the 
query plan tree. The user will still be able to duplicate the function if he 
wants to.

See this question on stackoverflow: 
https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark

  was:
Currently, while optimizing a query plan, spark pushes filters down the query 
plan tree, so that 

{code:title=LogicalPlan}
Filter UDF(a)
  +- Join Inner, (a = b)
   +- Relation
   +- Relation
{code}

becomes 

{code:title=Optimized LogicalPlan}
 Join Inner, (a = b)
 +- Filter UDF(a)
 +- Relation
 +- Filter UDF(b)
 +- Relation
{code}

In general, it is a good thing to push down filters as it reduces the number of 
records that will go through the join.

However, in the case where the filter is an user-defined function (UDF), we 
cannot know if the cost of executing the function twice will be higher than the 
eventual cost of joining more elements or not.

So I think that the optimizer shouldn't move the user-defined function in the 
query plan tree. The user will still be able to duplicate the function if he 
wants to.

See this question on stackoverflow: 
https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark


> Do not duplicate user-defined functions while optimizing logical query plans
> 
>
> Key: SPARK-20939
> URL: https://issues.apache.org/jira/browse/SPARK-20939
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Lovasoa
>Priority: Minor
>  Labels: logical_plan, optimizer
>
> Currently, while optimizing a query plan, spark pushes filters down the query 
> plan tree, so that 
> {code:title=LogicalPlan}
> Join Inner, (a = b)
> +- Filter UDF(a)
> +- Relation A
> +- Relation B
> {code}
> becomes 
> {code:title=Optimized LogicalPlan}
>  Join Inner, (a = b)
>  +- Filter UDF(a)
>  +- Relation A
>  +- Filter UDF(b)
>  +- Relation B
> {code}
> In general, it is a good thing to push down filters as it reduces the number 
> of records that will go through the join.
> However, in the case where the filter is an user-defined function (UDF), we 
> cannot know if the cost of executing the function twice will be higher than 
> the eventual cost of joining more elements or not.
> So I think that the optimizer shouldn't move the user-defined function in the 
> query plan tree. The user will still be able to duplicate the function if he 
> wants to.
> See this question on stackoverflow: 
> https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

2017-05-31 Thread Lovasoa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lovasoa updated SPARK-20939:

Description: 
Currently, while optimizing a query plan, spark pushes filters down the query 
plan tree, so that 

{code:title=LogicalPlan}
Filter UDF(a)
  +- Join Inner, (a = b)
   +- Relation
   +- Relation
{code}

becomes 

{code:title=Optimized LogicalPlan}
 Join Inner, (a = b)
 +- Filter UDF(a)
 +- Relation
 +- Filter UDF(b)
 +- Relation
{code}

In general, it is a good thing to push down filters as it reduces the number of 
records that will go through the join.

However, in the case where the filter is an user-defined function (UDF), we 
cannot know if the cost of executing the function twice will be higher than the 
eventual cost of joining more elements or not.

So I think that the optimizer shouldn't move the user-defined function in the 
query plan tree. The user will still be able to duplicate the function if he 
wants to.

See this question on stackoverflow: 
https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark

  was:
Currently, while optimizing a query plan, spark pushes filters down the query 
plan tree, so that 

{{
Filter UDF(a)
  +- Join Inner, (a = b)
   +- Relation
   +- Relation
}}

becomes 

{{
 Join Inner, (a = b)
 +- Filter UDF(a)
 +- Relation
 +- Filter UDF(b)
 +- Relation
}}

In general, it is a good thing to push down filters as it reduces the number of 
records that will go through the join.

However, in the case where the filter is an user-defined function (UDF), we 
cannot know if the cost of executing the function twice will be higher than the 
eventual cost of joining more elements or not.

So I think that the optimizer shouldn't move the user-defined function in the 
query plan tree. The user will still be able to duplicate the function if he 
wants to.

See this question on stackoverflow: 
https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark


> Do not duplicate user-defined functions while optimizing logical query plans
> 
>
> Key: SPARK-20939
> URL: https://issues.apache.org/jira/browse/SPARK-20939
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Lovasoa
>Priority: Minor
>  Labels: logical_plan, optimizer
>
> Currently, while optimizing a query plan, spark pushes filters down the query 
> plan tree, so that 
> {code:title=LogicalPlan}
> Filter UDF(a)
>   +- Join Inner, (a = b)
>+- Relation
>+- Relation
> {code}
> becomes 
> {code:title=Optimized LogicalPlan}
>  Join Inner, (a = b)
>  +- Filter UDF(a)
>  +- Relation
>  +- Filter UDF(b)
>  +- Relation
> {code}
> In general, it is a good thing to push down filters as it reduces the number 
> of records that will go through the join.
> However, in the case where the filter is an user-defined function (UDF), we 
> cannot know if the cost of executing the function twice will be higher than 
> the eventual cost of joining more elements or not.
> So I think that the optimizer shouldn't move the user-defined function in the 
> query plan tree. The user will still be able to duplicate the function if he 
> wants to.
> See this question on stackoverflow: 
> https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20939) Do not duplicate user-defined functions while optimizing logical query plans

2017-05-31 Thread Lovasoa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lovasoa updated SPARK-20939:

Description: 
Currently, while optimizing a query plan, spark pushes filters down the query 
plan tree, so that 

{{
Filter UDF(a)
  +- Join Inner, (a = b)
   +- Relation
   +- Relation
}}

becomes 

{{
 Join Inner, (a = b)
 +- Filter UDF(a)
 +- Relation
 +- Filter UDF(b)
 +- Relation
}}

In general, it is a good thing to push down filters as it reduces the number of 
records that will go through the join.

However, in the case where the filter is an user-defined function (UDF), we 
cannot know if the cost of executing the function twice will be higher than the 
eventual cost of joining more elements or not.

So I think that the optimizer shouldn't move the user-defined function in the 
query plan tree. The user will still be able to duplicate the function if he 
wants to.

See this question on stackoverflow: 
https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark

  was:
Currently, while optimizing a query plan, spark pushes filters down the query 
plan tree, so that 

Filter UDF(a)
  +- Join Inner, (a = b)
   +- Relation
   +- Relation

becomes 

 Join Inner, (a = b)
 +- Filter UDF(a)
 +- Relation
 +- Filter UDF(b)
 +- Relation

In general, it is a good thing to push down filters as it reduces the number of 
records that will go through the join.

However, in the case where the filter is an user-defined function (UDF), we 
cannot know if the cost of executing the function twice will be higher than the 
eventual cost of joining more elements or not.

So I think that the optimizer shouldn't move the user-defined function in the 
query plan tree. The user will still be able to duplicate the function if he 
wants to.

See this question on stackoverflow: 
https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark


> Do not duplicate user-defined functions while optimizing logical query plans
> 
>
> Key: SPARK-20939
> URL: https://issues.apache.org/jira/browse/SPARK-20939
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Lovasoa
>Priority: Minor
>  Labels: logical_plan, optimizer
>
> Currently, while optimizing a query plan, spark pushes filters down the query 
> plan tree, so that 
> {{
> Filter UDF(a)
>   +- Join Inner, (a = b)
>+- Relation
>+- Relation
> }}
> becomes 
> {{
>  Join Inner, (a = b)
>  +- Filter UDF(a)
>  +- Relation
>  +- Filter UDF(b)
>  +- Relation
> }}
> In general, it is a good thing to push down filters as it reduces the number 
> of records that will go through the join.
> However, in the case where the filter is an user-defined function (UDF), we 
> cannot know if the cost of executing the function twice will be higher than 
> the eventual cost of joining more elements or not.
> So I think that the optimizer shouldn't move the user-defined function in the 
> query plan tree. The user will still be able to duplicate the function if he 
> wants to.
> See this question on stackoverflow: 
> https://stackoverflow.com/questions/44291078/how-to-tune-the-query-planner-and-turn-off-an-optimization-in-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org