[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12734 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215226799 I fixed the title while merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215225764 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215225766 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57161/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215225477 **[Test build #57161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57161/consoleFull)** for PR 12734 at commit [`442265e`](https://github.com/apache/spark/commit/442265e59e6d441f42c1f22374b9ca47b337a9fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215198462 **[Test build #57161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57161/consoleFull)** for PR 12734 at commit [`442265e`](https://github.com/apache/spark/commit/442265e59e6d441f42c1f22374b9ca47b337a9fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215198428 Changes look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215198362 @liancheng The last commit adds a new test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12734#discussion_r61318397 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } else { SaveMode.ErrorIfExists } - CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, mode, options, query) + + val partitionColumnNames = +Option(ctx.partitionColumnNames) + .map(visitIdentifierList(_).toArray) + .getOrElse(Array.empty[String]) + + CreateTableUsingAsSelect( +table, provider, temp, partitionColumnNames, bucketSpec, mode, options, query) } else { - val struct = Option(ctx.colTypeList).map(createStructType) + val struct = Option(ctx.colTypeList()).map(createStructType) --- End diff -- oh, sorry. PARTITIONED BY and CLUSTERED BY are both associated with CREATE TABLE USING AS SELECT rule. So, for CREATE TABLE USING, if PARTITIONED BY or CLUSTERED PY is provided, we already throw an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215191068 oh, I cannot change it. @liancheng will change the title after he gets up :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215189992 Could you change the title to `[SPARK-14954] (current title)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215188645 Yea. https://issues.apache.org/jira/browse/SPARK-14954 is the jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215186982 Does this pr fix a ticket? In that case it would be useful to change the title to include the [SPARK-] prefix so that the JIRA status gets updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12734#discussion_r61305260 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } else { SaveMode.ErrorIfExists } - CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, mode, options, query) + + val partitionColumnNames = +Option(ctx.partitionColumnNames) + .map(visitIdentifierList(_).toArray) + .getOrElse(Array.empty[String]) + + CreateTableUsingAsSelect( +table, provider, temp, partitionColumnNames, bucketSpec, mode, options, query) } else { - val struct = Option(ctx.colTypeList).map(createStructType) + val struct = Option(ctx.colTypeList()).map(createStructType) --- End diff -- I am going to add the check for this else branch and add some tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12734#discussion_r61303166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } else { SaveMode.ErrorIfExists } - CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, mode, options, query) + + val partitionColumnNames = +Option(ctx.partitionColumnNames) + .map(visitIdentifierList(_).toArray) + .getOrElse(Array.empty[String]) + + CreateTableUsingAsSelect( +table, provider, temp, partitionColumnNames, bucketSpec, mode, options, query) } else { - val struct = Option(ctx.colTypeList).map(createStructType) + val struct = Option(ctx.colTypeList()).map(createStructType) --- End diff -- One thing that is not very related to this pr. I always find that the keyword `CLUSTERED BY` is very confusing, because there is a `CLUSTER BY` keyword (, which is `DISTRIBUTE BY` + `SORT BY`). But, we do not need to change it right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12734#discussion_r61302579 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } else { SaveMode.ErrorIfExists } - CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, mode, options, query) + + val partitionColumnNames = +Option(ctx.partitionColumnNames) + .map(visitIdentifierList(_).toArray) + .getOrElse(Array.empty[String]) + + CreateTableUsingAsSelect( +table, provider, temp, partitionColumnNames, bucketSpec, mode, options, query) } else { - val struct = Option(ctx.colTypeList).map(createStructType) + val struct = Option(ctx.colTypeList()).map(createStructType) --- End diff -- If the command is not CTAS statement, seems we should throw exceptions if users define any of `PARTITIONED BY`, `SORTED BY`, or `BUCKETED BY` clause? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215164079 For `DataFrameWriter`, can we do `sortBy` without using `bucketBy`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215131297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57129/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215131294 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215130925 **[Test build #57129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57129/consoleFull)** for PR 12734 at commit [`a193faf`](https://github.com/apache/spark/commit/a193faf3f82be52de4369f8c2b529801ab2a9da5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215107260 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57127/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215107256 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215107049 **[Test build #57127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57127/consoleFull)** for PR 12734 at commit [`af973d6`](https://github.com/apache/spark/commit/af973d64cf3e1079e6c8a185d826e2e43cb06532). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215099565 **[Test build #57129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57129/consoleFull)** for PR 12734 at commit [`a193faf`](https://github.com/apache/spark/commit/a193faf3f82be52de4369f8c2b529801ab2a9da5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org