[jira] [Commented] (SPARK-15453) Improve join planning for bucketed / sorted tables

2016-08-29 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446283#comment-15446283
 ] 

Apache Spark commented on SPARK-15453:
--

User 'tejasapatil' has created a pull request for this issue:
https://github.com/apache/spark/pull/14864

> Improve join planning for bucketed / sorted tables
> --
>
> Key: SPARK-15453
> URL: https://issues.apache.org/jira/browse/SPARK-15453
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Tejas Patil
>Priority: Minor
>
> Datasource allows creation of bucketed and sorted tables but performing joins 
> on such tables still does not utilize this metadata to produce optimal query 
> plan.
> As below, the `Exchange` and `Sort` can be avoided if the tables are known to 
> be hashed + sorted on relevant columns.
> {noformat}
> == Physical Plan ==
> WholeStageCodegen
> :  +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None
> : :- INPUT
> : +- INPUT
> :- WholeStageCodegen
> :  :  +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0
> :  : +- INPUT
> :  +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None
> : +- WholeStageCodegen
> ::  +- Project [j#20,k#21,i#22]
> :: +- Filter (isnotnull(k#21) && isnotnull(j#20))
> ::+- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, 
> InputPaths: file:/XXX/table7, PushedFilters: [IsNotNull(k), 
> IsNotNull(j)], ReadSchema: struct
> +- WholeStageCodegen
>:  +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0
>: +- INPUT
>+- Exchange hashpartitioning(j#23, k#24, i#25, 200), None
>   +- WholeStageCodegen
>  :  +- Project [j#23,k#24,i#25]
>  : +- Filter (isnotnull(k#24) && isnotnull(j#23))
>  :+- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, 
> InputPaths: file:/XXX/table8, PushedFilters: [IsNotNull(k), 
> IsNotNull(j)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15453) Improve join planning for bucketed / sorted tables

2016-05-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294165#comment-15294165
 ] 

Apache Spark commented on SPARK-15453:
--

User 'tejasapatil' has created a pull request for this issue:
https://github.com/apache/spark/pull/13231

> Improve join planning for bucketed / sorted tables
> --
>
> Key: SPARK-15453
> URL: https://issues.apache.org/jira/browse/SPARK-15453
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Tejas Patil
>Priority: Minor
>
> Datasource allows creation of bucketed and sorted tables but performing joins 
> on such tables still does not utilize this metadata to produce optimal query 
> plan.
> As below, the `Exchange` and `Sort` can be avoided if the tables are known to 
> be hashed + sorted on relevant columns.
> {noformat}
> == Physical Plan ==
> WholeStageCodegen
> :  +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None
> : :- INPUT
> : +- INPUT
> :- WholeStageCodegen
> :  :  +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0
> :  : +- INPUT
> :  +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None
> : +- WholeStageCodegen
> ::  +- Project [j#20,k#21,i#22]
> :: +- Filter (isnotnull(k#21) && isnotnull(j#20))
> ::+- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, 
> InputPaths: file:/XXX/table7, PushedFilters: [IsNotNull(k), 
> IsNotNull(j)], ReadSchema: struct
> +- WholeStageCodegen
>:  +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0
>: +- INPUT
>+- Exchange hashpartitioning(j#23, k#24, i#25, 200), None
>   +- WholeStageCodegen
>  :  +- Project [j#23,k#24,i#25]
>  : +- Filter (isnotnull(k#24) && isnotnull(j#23))
>  :+- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, 
> InputPaths: file:/XXX/table8, PushedFilters: [IsNotNull(k), 
> IsNotNull(j)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15453) Improve join planning for bucketed / sorted tables

2016-05-20 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294053#comment-15294053
 ] 

Reynold Xin commented on SPARK-15453:
-

[~tejasp] there are multiple issues here right? The ticket is actually not 
about smj, but rather avoiding exchanges if the input are already 
co-partitioned, and also avoiding sorts if the input are already sorted?


> Improve join planning for bucketed / sorted tables
> --
>
> Key: SPARK-15453
> URL: https://issues.apache.org/jira/browse/SPARK-15453
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Tejas Patil
>Priority: Minor
>
> Datasource allows creation of bucketed and sorted tables but performing joins 
> on such tables still does not utilize this metadata to produce optimal query 
> plan.
> As below, the `Exchange` and `Sort` can be avoided if the tables are known to 
> be hashed + sorted on relevant columns.
> {noformat}
> == Physical Plan ==
> WholeStageCodegen
> :  +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None
> : :- INPUT
> : +- INPUT
> :- WholeStageCodegen
> :  :  +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0
> :  : +- INPUT
> :  +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None
> : +- WholeStageCodegen
> ::  +- Project [j#20,k#21,i#22]
> :: +- Filter (isnotnull(k#21) && isnotnull(j#20))
> ::+- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, 
> InputPaths: file:/XXX/table7, PushedFilters: [IsNotNull(k), 
> IsNotNull(j)], ReadSchema: struct
> +- WholeStageCodegen
>:  +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0
>: +- INPUT
>+- Exchange hashpartitioning(j#23, k#24, i#25, 200), None
>   +- WholeStageCodegen
>  :  +- Project [j#23,k#24,i#25]
>  : +- Filter (isnotnull(k#24) && isnotnull(j#23))
>  :+- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, 
> InputPaths: file:/XXX/table8, PushedFilters: [IsNotNull(k), 
> IsNotNull(j)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org