[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12734


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215226799
  
I fixed the title while merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215225764
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215225766
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57161/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215225477
  
**[Test build #57161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57161/consoleFull)**
 for PR 12734 at commit 
[`442265e`](https://github.com/apache/spark/commit/442265e59e6d441f42c1f22374b9ca47b337a9fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215198462
  
**[Test build #57161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57161/consoleFull)**
 for PR 12734 at commit 
[`442265e`](https://github.com/apache/spark/commit/442265e59e6d441f42c1f22374b9ca47b337a9fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215198428
  
Changes look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215198362
  
@liancheng The last commit adds a new test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12734#discussion_r61318397
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   } else {
 SaveMode.ErrorIfExists
   }
-  CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, 
mode, options, query)
+
+  val partitionColumnNames =
+Option(ctx.partitionColumnNames)
+  .map(visitIdentifierList(_).toArray)
+  .getOrElse(Array.empty[String])
+
+  CreateTableUsingAsSelect(
+table, provider, temp, partitionColumnNames, bucketSpec, mode, 
options, query)
 } else {
-  val struct = Option(ctx.colTypeList).map(createStructType)
+  val struct = Option(ctx.colTypeList()).map(createStructType)
--- End diff --

oh, sorry. PARTITIONED BY and CLUSTERED BY are both associated with CREATE 
TABLE USING AS SELECT rule. So, for CREATE TABLE USING, if PARTITIONED BY or 
CLUSTERED PY is provided, we already throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215191068
  
oh, I cannot change it. @liancheng will change the title after he gets up :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215189992
  
Could you change the title to `[SPARK-14954] (current title)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215188645
  
Yea. https://issues.apache.org/jira/browse/SPARK-14954 is the jira. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215186982
  
Does this pr fix a ticket? In that case it would be useful to change the 
title to include the [SPARK-] prefix so that the JIRA  status gets updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12734#discussion_r61305260
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   } else {
 SaveMode.ErrorIfExists
   }
-  CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, 
mode, options, query)
+
+  val partitionColumnNames =
+Option(ctx.partitionColumnNames)
+  .map(visitIdentifierList(_).toArray)
+  .getOrElse(Array.empty[String])
+
+  CreateTableUsingAsSelect(
+table, provider, temp, partitionColumnNames, bucketSpec, mode, 
options, query)
 } else {
-  val struct = Option(ctx.colTypeList).map(createStructType)
+  val struct = Option(ctx.colTypeList()).map(createStructType)
--- End diff --

I am going to add the check for this else branch and add some tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12734#discussion_r61303166
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   } else {
 SaveMode.ErrorIfExists
   }
-  CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, 
mode, options, query)
+
+  val partitionColumnNames =
+Option(ctx.partitionColumnNames)
+  .map(visitIdentifierList(_).toArray)
+  .getOrElse(Array.empty[String])
+
+  CreateTableUsingAsSelect(
+table, provider, temp, partitionColumnNames, bucketSpec, mode, 
options, query)
 } else {
-  val struct = Option(ctx.colTypeList).map(createStructType)
+  val struct = Option(ctx.colTypeList()).map(createStructType)
--- End diff --

One thing that is not very related to this pr. I always find that the 
keyword `CLUSTERED BY` is very confusing, because there is a `CLUSTER BY` 
keyword (, which is `DISTRIBUTE BY` + `SORT BY`). But, we do not need to change 
it right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12734#discussion_r61302579
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -264,9 +265,16 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   } else {
 SaveMode.ErrorIfExists
   }
-  CreateTableUsingAsSelect(table, provider, temp, Array.empty, None, 
mode, options, query)
+
+  val partitionColumnNames =
+Option(ctx.partitionColumnNames)
+  .map(visitIdentifierList(_).toArray)
+  .getOrElse(Array.empty[String])
+
+  CreateTableUsingAsSelect(
+table, provider, temp, partitionColumnNames, bucketSpec, mode, 
options, query)
 } else {
-  val struct = Option(ctx.colTypeList).map(createStructType)
+  val struct = Option(ctx.colTypeList()).map(createStructType)
--- End diff --

If the command is not CTAS statement, seems we should throw exceptions if 
users define any of `PARTITIONED BY`, `SORTED BY`, or `BUCKETED BY` clause?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215164079
  
For `DataFrameWriter`, can we do `sortBy` without using `bucketBy`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215131297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57129/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215131294
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215130925
  
**[Test build #57129 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57129/consoleFull)**
 for PR 12734 at commit 
[`a193faf`](https://github.com/apache/spark/commit/a193faf3f82be52de4369f8c2b529801ab2a9da5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215107260
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57127/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215107256
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215107049
  
**[Test build #57127 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57127/consoleFull)**
 for PR 12734 at commit 
[`af973d6`](https://github.com/apache/spark/commit/af973d64cf3e1079e6c8a185d826e2e43cb06532).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215099565
  
**[Test build #57129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57129/consoleFull)**
 for PR 12734 at commit 
[`a193faf`](https://github.com/apache/spark/commit/a193faf3f82be52de4369f8c2b529801ab2a9da5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org