[jira] [Commented] (SPARK-15453) Improve join planning for bucketed / sorted tables
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446283#comment-15446283 ] Apache Spark commented on SPARK-15453: -- User 'tejasapatil' has created a pull request for this issue: https://github.com/apache/spark/pull/14864 > Improve join planning for bucketed / sorted tables > -- > > Key: SPARK-15453 > URL: https://issues.apache.org/jira/browse/SPARK-15453 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Tejas Patil >Priority: Minor > > Datasource allows creation of bucketed and sorted tables but performing joins > on such tables still does not utilize this metadata to produce optimal query > plan. > As below, the `Exchange` and `Sort` can be avoided if the tables are known to > be hashed + sorted on relevant columns. > {noformat} > == Physical Plan == > WholeStageCodegen > : +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None > : :- INPUT > : +- INPUT > :- WholeStageCodegen > : : +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0 > : : +- INPUT > : +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None > : +- WholeStageCodegen > :: +- Project [j#20,k#21,i#22] > :: +- Filter (isnotnull(k#21) && isnotnull(j#20)) > ::+- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, > InputPaths: file:/XXX/table7, PushedFilters: [IsNotNull(k), > IsNotNull(j)], ReadSchema: struct > +- WholeStageCodegen >: +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0 >: +- INPUT >+- Exchange hashpartitioning(j#23, k#24, i#25, 200), None > +- WholeStageCodegen > : +- Project [j#23,k#24,i#25] > : +- Filter (isnotnull(k#24) && isnotnull(j#23)) > :+- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, > InputPaths: file:/XXX/table8, PushedFilters: [IsNotNull(k), > IsNotNull(j)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15453) Improve join planning for bucketed / sorted tables
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294165#comment-15294165 ] Apache Spark commented on SPARK-15453: -- User 'tejasapatil' has created a pull request for this issue: https://github.com/apache/spark/pull/13231 > Improve join planning for bucketed / sorted tables > -- > > Key: SPARK-15453 > URL: https://issues.apache.org/jira/browse/SPARK-15453 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Tejas Patil >Priority: Minor > > Datasource allows creation of bucketed and sorted tables but performing joins > on such tables still does not utilize this metadata to produce optimal query > plan. > As below, the `Exchange` and `Sort` can be avoided if the tables are known to > be hashed + sorted on relevant columns. > {noformat} > == Physical Plan == > WholeStageCodegen > : +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None > : :- INPUT > : +- INPUT > :- WholeStageCodegen > : : +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0 > : : +- INPUT > : +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None > : +- WholeStageCodegen > :: +- Project [j#20,k#21,i#22] > :: +- Filter (isnotnull(k#21) && isnotnull(j#20)) > ::+- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, > InputPaths: file:/XXX/table7, PushedFilters: [IsNotNull(k), > IsNotNull(j)], ReadSchema: struct > +- WholeStageCodegen >: +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0 >: +- INPUT >+- Exchange hashpartitioning(j#23, k#24, i#25, 200), None > +- WholeStageCodegen > : +- Project [j#23,k#24,i#25] > : +- Filter (isnotnull(k#24) && isnotnull(j#23)) > :+- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, > InputPaths: file:/XXX/table8, PushedFilters: [IsNotNull(k), > IsNotNull(j)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15453) Improve join planning for bucketed / sorted tables
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294053#comment-15294053 ] Reynold Xin commented on SPARK-15453: - [~tejasp] there are multiple issues here right? The ticket is actually not about smj, but rather avoiding exchanges if the input are already co-partitioned, and also avoiding sorts if the input are already sorted? > Improve join planning for bucketed / sorted tables > -- > > Key: SPARK-15453 > URL: https://issues.apache.org/jira/browse/SPARK-15453 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Tejas Patil >Priority: Minor > > Datasource allows creation of bucketed and sorted tables but performing joins > on such tables still does not utilize this metadata to produce optimal query > plan. > As below, the `Exchange` and `Sort` can be avoided if the tables are known to > be hashed + sorted on relevant columns. > {noformat} > == Physical Plan == > WholeStageCodegen > : +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None > : :- INPUT > : +- INPUT > :- WholeStageCodegen > : : +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0 > : : +- INPUT > : +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None > : +- WholeStageCodegen > :: +- Project [j#20,k#21,i#22] > :: +- Filter (isnotnull(k#21) && isnotnull(j#20)) > ::+- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, > InputPaths: file:/XXX/table7, PushedFilters: [IsNotNull(k), > IsNotNull(j)], ReadSchema: struct > +- WholeStageCodegen >: +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0 >: +- INPUT >+- Exchange hashpartitioning(j#23, k#24, i#25, 200), None > +- WholeStageCodegen > : +- Project [j#23,k#24,i#25] > : +- Filter (isnotnull(k#24) && isnotnull(j#23)) > :+- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, > InputPaths: file:/XXX/table8, PushedFilters: [IsNotNull(k), > IsNotNull(j)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org