[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE

2016-04-13 Thread yhuai
Github user yhuai closed the pull request at:

https://github.com/apache/spark/pull/12363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE

2016-04-13 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12363#issuecomment-209574950
  
OK, let's close this PR now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209572487
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209572492
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55727/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209572170
  
**[Test build #55727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55727/consoleFull)**
 for PR 12271 at commit 
[`55957bd`](https://github.com/apache/spark/commit/55957bdcd1dea06da750c5cf7cfa587da5250574).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209572290
  
OK. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59593820
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

Yeah, agree if you are talking about my case. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59593113
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

I feel it is better to have more checks in SessionCatalog to make sure that 
a request is valid (it can help your case).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE

2016-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12363#issuecomment-209558774
  
**[Test build #55724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55724/consoleFull)**
 for PR 12363 at commit 
[`ab70cb7`](https://github.com/apache/spark/commit/ab70cb751cce8ca2e0757b9fa523534207864328).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12363#issuecomment-209559349
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12363#issuecomment-209559352
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55724/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59589025
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

Now, after a few bad experiences, I think we need to add all the test cases 
to ensure the Hive APIs work as what we expected. See another issue I hit in 
`alter table ... drop partition`: 
https://github.com/apache/spark/pull/12220#discussion-diff-59555113


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209536900
  
**[Test build #55727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55727/consoleFull)**
 for PR 12271 at commit 
[`55957bd`](https://github.com/apache/spark/commit/55957bdcd1dea06da750c5cf7cfa587da5250574).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE

2016-04-13 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12363#issuecomment-209536036
  
OK, I cherry-picked your changes and updated the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209536129
  
OK, I cherry-picked your changes and updated the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59580227
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

why are there all these undocumented implicit behaviors in Hive :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59577033
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

I submitted PR to try it out (https://github.com/apache/spark/pull/12363). 
You can find my change in the last commit 
(https://github.com/apache/spark/pull/12363/commits/ab70cb751cce8ca2e0757b9fa523534207864328).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59574261
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

For now, if the schema does not have any field, can we set it to the 
following one?
```
[# col_name data_type   comment ]
[]
[colarray   from deserializer   ]
```

So, we try to preserve the existing (and weird) behavior. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59573943
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -675,34 +680,38 @@ private[hive] class HiveClientImpl(
 
   private def toHiveTable(table: CatalogTable): HiveTable = {
 val hiveTable = new HiveTable(table.database, table.identifier.table)
-// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL 
field in
-// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is 
the table type
-// but EXTERNAL field is not set, Hive metastore will change the type 
to
-// MANAGED_TABLE (see
-// 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table 
properties.
+// Otherwise, Hive metastore will change the table to a MANAGED_TABLE.
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
 hiveTable.setTableType(table.tableType match {
   case CatalogTableType.EXTERNAL_TABLE =>
 hiveTable.setProperty("EXTERNAL", "TRUE")
 HiveTableType.EXTERNAL_TABLE
   case CatalogTableType.MANAGED_TABLE =>
-hiveTable.setProperty("EXTERNAL", "FALSE")
 HiveTableType.MANAGED_TABLE
   case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE
   case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW
 })
-hiveTable.setFields(table.schema.map(toHiveColumn).asJava)
-hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava)
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = table.schema.map(toHiveColumn).partition { c 
=>
+  table.partitionColumnNames.contains(c.getName)
+}
+hiveTable.setFields(schema.asJava)
+hiveTable.setPartCols(partCols.asJava)
 // TODO: set sort columns here too
+hiveTable.setBucketCols(table.bucketColumnNames.asJava)
 hiveTable.setOwner(conf.getUser)
 hiveTable.setNumBuckets(table.numBuckets)
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
 table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
 
table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass)
 
table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass)
-table.storage.serde.foreach(hiveTable.setSerializationLib)
+hiveTable.setSerializationLib(
+  
table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
--- End diff --

hmm... Without this PR, the storage related fields of the table defined in 
`org.apache.spark.sql.hive.MetastoreDataSourcesSuite.persistent JSON table` are:
```
[SerDe Library: 
org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe]
[InputFormat:   
org.apache.hadoop.mapred.SequenceFileInputFormat ]
[OutputFormat:  
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat]
```

Looks like the change at here somehow breaks the test (without the change, 
we can somehow trigger a weird code path in Hive and get at least have one 
column called `col`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209284615
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209284620
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55696/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209283962
  
**[Test build #55696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55696/consoleFull)**
 for PR 12271 at commit 
[`02738fe`](https://github.com/apache/spark/commit/02738fed54ba0eebc2c8d887430f0bef34213c68).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209283010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55693/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209283002
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209282220
  
**[Test build #55693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55693/consoleFull)**
 for PR 12271 at commit 
[`a60e66a`](https://github.com/apache/spark/commit/a60e66a71dec96973043ec62e2b6d4213c5add2c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209272015
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209272026
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55692/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209270685
  
**[Test build #55692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55692/consoleFull)**
 for PR 12271 at commit 
[`59edce3`](https://github.com/apache/spark/commit/59edce332f87b07bdfb07e2e385431b2b123e1b0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59501698
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -435,13 +480,15 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   /**
* Create a sequence of [[CatalogColumn]]s from a column list
*/
-  private def visitCatalogColumns(
-  ctx: ColTypeListContext,
-  formatter: String => String = identity): Seq[CatalogColumn] = 
withOrigin(ctx) {
+  private def visitCatalogColumns(ctx: ColTypeListContext): 
Seq[CatalogColumn] = withOrigin(ctx) {
 ctx.colType.asScala.map { col =>
   CatalogColumn(
-formatter(col.identifier.getText),
-col.dataType.getText.toLowerCase, // TODO validate this?
+col.identifier.getText.toLowerCase,
+// Note: for types like "STRUCT" we can't
+// just convert the whole type string to lower case, otherwise the 
struct field names
+// will no longer be case sensitive. Instead, we rely on our 
parser to get the proper
+// case before passing it to Hive.
+CatalystSqlParser.parseDataType(col.dataType.getText).simpleString,
--- End diff --

MINOR/NIT: The DataType parsing is done in the `AstBuilder`, so we really 
don't need to parse this string again. You could use (magical/evil) 
`typedVisit[DataType](col.dataType)` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59501466
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -488,7 +488,7 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
 // just convert the whole type string to lower case, otherwise the 
struct field names
 // will no longer be case sensitive. Instead, we rely on our 
parser to get the proper
 // case before passing it to Hive.
-HiveMetastoreTypes.toDataType(col.dataType.getText).simpleString,
+CatalystSqlParser.parseDataType(col.dataType.getText).simpleString,
--- End diff --

MINOR/NIT: The DataType parsing is done in the `AstBuilder`, so we really 
don't need to parse this string again. You could use (magical/evil) 
`typedVisit[DataType](col.dataType)` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209254390
  
**[Test build #55696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55696/consoleFull)**
 for PR 12271 at commit 
[`02738fe`](https://github.com/apache/spark/commit/02738fed54ba0eebc2c8d887430f0bef34213c68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209252514
  
**[Test build #55693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55693/consoleFull)**
 for PR 12271 at commit 
[`a60e66a`](https://github.com/apache/spark/commit/a60e66a71dec96973043ec62e2b6d4213c5add2c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209247087
  
**[Test build #55692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55692/consoleFull)**
 for PR 12271 at commit 
[`59edce3`](https://github.com/apache/spark/commit/59edce332f87b07bdfb07e2e385431b2b123e1b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59497215
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -435,13 +480,15 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   /**
* Create a sequence of [[CatalogColumn]]s from a column list
*/
-  private def visitCatalogColumns(
-  ctx: ColTypeListContext,
-  formatter: String => String = identity): Seq[CatalogColumn] = 
withOrigin(ctx) {
+  private def visitCatalogColumns(ctx: ColTypeListContext): 
Seq[CatalogColumn] = withOrigin(ctx) {
 ctx.colType.asScala.map { col =>
   CatalogColumn(
-formatter(col.identifier.getText),
-col.dataType.getText.toLowerCase, // TODO validate this?
+col.identifier.getText.toLowerCase,
+// Note: for types like "STRUCT" we can't
+// just convert the whole type string to lower case, otherwise the 
struct field names
+// will no longer be case sensitive. Instead, we rely on our 
parser to get the proper
+// case before passing it to Hive.
+HiveMetastoreTypes.toDataType(col.dataType.getText).simpleString,
--- End diff --

I think we need to use `parseDataType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59492629
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,115 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
-  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
+if (temp) {
+  throw new ParseException(
+"CREATE TEMPORARY TABLE is not supported yet. " +
+"Please use registerTempTable as an alternative.", ctx)
+}
+if (ctx.skewSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
SKEWED BY ...", ctx)
+}
+if (ctx.bucketSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
CLUSTERED BY ...", ctx)
+}
+val tableType = if (external) {
+  CatalogTableType.EXTERNAL_TABLE
 } else {
-  // Get the table header.
-  val (table, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
-  val tableType = if (external) {
-CatalogTableType.EXTERNAL_TABLE
-  } else {
-CatalogTableType.MANAGED_TABLE
-  }
-
-  // Unsupported clauses.
-  if (temp) {
-throw new ParseException(s"Unsupported operation: TEMPORARY 
clause.", ctx)
-  }
-  if (ctx.bucketSpec != null) {
-// TODO add this - we need cluster columns in the CatalogTable for 
this to work.
-throw new ParseException("Unsupported operation: " +
-  "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", 
ctx)
-  }
-  if (ctx.skewSpec != null) {
-throw new ParseException("Operation not allowed: " +
-  "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx)
-  }
-
-  // Create the schema.
-  val schema = 
Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase))
-
-  // Get the column by which the table is partitioned.
-  val partitionCols = 
Option(ctx.partitionColumns).toSeq.flatMap(vis

[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59492242
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,115 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
-  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
+if (temp) {
+  throw new ParseException(
+"CREATE TEMPORARY TABLE is not supported yet. " +
+"Please use registerTempTable as an alternative.", ctx)
+}
+if (ctx.skewSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
SKEWED BY ...", ctx)
+}
+if (ctx.bucketSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
CLUSTERED BY ...", ctx)
+}
+val tableType = if (external) {
+  CatalogTableType.EXTERNAL_TABLE
 } else {
-  // Get the table header.
-  val (table, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
-  val tableType = if (external) {
-CatalogTableType.EXTERNAL_TABLE
-  } else {
-CatalogTableType.MANAGED_TABLE
-  }
-
-  // Unsupported clauses.
-  if (temp) {
-throw new ParseException(s"Unsupported operation: TEMPORARY 
clause.", ctx)
-  }
-  if (ctx.bucketSpec != null) {
-// TODO add this - we need cluster columns in the CatalogTable for 
this to work.
-throw new ParseException("Unsupported operation: " +
-  "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", 
ctx)
-  }
-  if (ctx.skewSpec != null) {
-throw new ParseException("Operation not allowed: " +
-  "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx)
-  }
-
-  // Create the schema.
-  val schema = 
Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase))
-
-  // Get the column by which the table is partitioned.
-  val partitionCols = 
Option(ctx.partitionColumns).toSeq.flatMap(vis

[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59491556
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -790,14 +821,13 @@ class HiveCompatibilitySuite extends 
HiveQueryFileTest with BeforeAndAfter {
 "nullscript",
 "optional_outer",
 "orc_dictionary_threshold",
-"orc_empty_files",
 "order",
 "order2",
 "outer_join_ppr",
 "parallel",
 "parenthesis_star_by",
 "part_inherit_tbl_props",
-"part_inherit_tbl_props_empty",
+//"part_inherit_tbl_props_empty", // TODO: results don't match
--- End diff --

This one and others that are commented out below are caused by differences 
of describe table. There are mainly three differences. First, for managed 
tables, we will EXTERNAL set to false in table properties. For tables that are 
not bucketed, we will have numBuckets set to 0 instead of -1. For tables that 
does not specify the file format, the output format will be 
org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat instead of 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat (For Hive, it 
somehow replaced even if the explain output still shows 
IgnoreKeyTextOutputFormat). We do not need to block this PR for any of them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59491359
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -490,15 +538,13 @@ class HiveCompatibilitySuite extends 
HiveQueryFileTest with BeforeAndAfter {
 "count",
 "cp_mj_rc",
 "create_insert_outputformat",
-"create_like_tbl_props",
+//"create_like_tbl_props", // TODO: results don't match
--- End diff --

https://issues.apache.org/jira/browse/SPARK-14592


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59491266
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -509,7 +555,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest 
with BeforeAndAfter {
 "date_comparison",
 "date_join1",
 "date_serde",
-"decimal_1",
+//"decimal_1", // TODO: cannot parse column decimal(5)
--- End diff --

https://issues.apache.org/jira/browse/SPARK-14591


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59491102
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -509,7 +555,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest 
with BeforeAndAfter {
 "date_comparison",
 "date_join1",
 "date_serde",
-"decimal_1",
+//"decimal_1", // TODO: cannot parse column decimal(5)
--- End diff --

Yea. We should support decimal(5). At here, the scale is 0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209194477
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209194478
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55674/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209194377
  
**[Test build #55674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55674/consoleFull)**
 for PR 12271 at commit 
[`8e273fd`](https://github.com/apache/spark/commit/8e273fdc4f95d08cb6d09f4641472861587a3a01).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209172735
  
**[Test build #55674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55674/consoleFull)**
 for PR 12271 at commit 
[`8e273fd`](https://github.com/apache/spark/commit/8e273fdc4f95d08cb6d09f4641472861587a3a01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209172015
  
I've ignored some tests in `HiveCompatibilitySuite` for now. I'll have a 
look at them shortly. I just wanted to see if the rest of the tests will pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209169942
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209169944
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55666/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209169833
  
**[Test build #55666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55666/consoleFull)**
 for PR 12271 at commit 
[`8dc554a`](https://github.com/apache/spark/commit/8dc554a38c9989fc43b119645bfe5c8ceb7b6cdb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209158268
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55658/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209158146
  
**[Test build #55658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55658/consoleFull)**
 for PR 12271 at commit 
[`50a2054`](https://github.com/apache/spark/commit/50a2054ec7a7276d45c1ab5adabd4550e00c7811).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209158266
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209153212
  
**[Test build #55666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55666/consoleFull)**
 for PR 12271 at commit 
[`8dc554a`](https://github.com/apache/spark/commit/8dc554a38c9989fc43b119645bfe5c8ceb7b6cdb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209135892
  
**[Test build #55658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55658/consoleFull)**
 for PR 12271 at commit 
[`50a2054`](https://github.com/apache/spark/commit/50a2054ec7a7276d45c1ab5adabd4550e00c7811).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209135567
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209135570
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55654/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209128481
  
**[Test build #55654 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55654/consoleFull)**
 for PR 12271 at commit 
[`efecac9`](https://github.com/apache/spark/commit/efecac9b01b3ff8be296234392f4a6c922fa2d25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209115335
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55644/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209115331
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209115138
  
**[Test build #55644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55644/consoleFull)**
 for PR 12271 at commit 
[`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209092173
  
**[Test build #55644 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55644/consoleFull)**
 for PR 12271 at commit 
[`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209091270
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209087621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55634/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209087417
  
**[Test build #55634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55634/consoleFull)**
 for PR 12271 at commit 
[`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209087617
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209086376
  
LGTM - pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59445656
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
-  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
+if (temp) {
+  throw new ParseException(
+"CREATE TEMPORARY TABLE is not supported yet. " +
+"Please use registerTempTable as an alternative.", ctx)
+}
+if (ctx.skewSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
SKEWED BY ...", ctx)
+}
+if (ctx.bucketSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
CLUSTERED BY ...", ctx)
+}
+val tableType = if (external) {
+  CatalogTableType.EXTERNAL_TABLE
 } else {
-  // Get the table header.
-  val (table, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
-  val tableType = if (external) {
-CatalogTableType.EXTERNAL_TABLE
-  } else {
-CatalogTableType.MANAGED_TABLE
-  }
-
-  // Unsupported clauses.
-  if (temp) {
-throw new ParseException(s"Unsupported operation: TEMPORARY 
clause.", ctx)
-  }
-  if (ctx.bucketSpec != null) {
-// TODO add this - we need cluster columns in the CatalogTable for 
this to work.
-throw new ParseException("Unsupported operation: " +
-  "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", 
ctx)
-  }
-  if (ctx.skewSpec != null) {
-throw new ParseException("Operation not allowed: " +
-  "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx)
-  }
-
-  // Create the schema.
-  val schema = 
Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase))
-
-  // Get the column by which the table is partitioned.
-  val partitionCols = 
Option(ctx.partitionColumns).toSeq.flatMa

[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209058443
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209058447
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55628/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209061013
  
**[Test build #55634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55634/consoleFull)**
 for PR 12271 at commit 
[`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209059940
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209058214
  
**[Test build #55628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55628/consoleFull)**
 for PR 12271 at commit 
[`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59426592
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -134,82 +136,117 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  case (fileFormat, null) if fileFormat != null =>
+fileFormat match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case c: TableFileFormatContext => visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | 
PARQUET | AVRO
+  case c: GenericFileFormatContext => visitGenericFileFormat(c)
+}
+  case (null, storageHandler) if storageHandler != null =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case (null, null) =>
+throw new ParseException("expected one of STORED AS or STORED BY", 
ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table. TODO: expand this comment!
+   *
+   * For example:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...)
+   *   [STORED AS DIRECTORIES]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
   override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
+if (temp) {
+  throw new AnalysisException(
--- End diff --

(I actually fixed this in this patch, but we still need to fix it in all 
the other places in SPARK-14414)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209032675
  
**[Test build #55628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55628/consoleFull)**
 for PR 12271 at commit 
[`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209031556
  
Thanks for the reviews. Latest commit should pass tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59423143
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
-  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
+if (temp) {
+  throw new ParseException(
+"CREATE TEMPORARY TABLE is not supported yet. " +
+"Please use registerTempTable as an alternative.", ctx)
+}
+if (ctx.skewSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
SKEWED BY ...", ctx)
+}
+if (ctx.bucketSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
CLUSTERED BY ...", ctx)
+}
+val tableType = if (external) {
+  CatalogTableType.EXTERNAL_TABLE
 } else {
-  // Get the table header.
-  val (table, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
-  val tableType = if (external) {
-CatalogTableType.EXTERNAL_TABLE
-  } else {
-CatalogTableType.MANAGED_TABLE
-  }
-
-  // Unsupported clauses.
-  if (temp) {
-throw new ParseException(s"Unsupported operation: TEMPORARY 
clause.", ctx)
-  }
-  if (ctx.bucketSpec != null) {
-// TODO add this - we need cluster columns in the CatalogTable for 
this to work.
-throw new ParseException("Unsupported operation: " +
-  "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", 
ctx)
-  }
-  if (ctx.skewSpec != null) {
-throw new ParseException("Operation not allowed: " +
-  "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx)
-  }
-
-  // Create the schema.
-  val schema = 
Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase))
-
-  // Get the column by which the table is partitioned.
-  val partitionCols = 
Option(ctx.partitionColumns).toSeq.flatMa

[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-209030454
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59421983
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
-  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
+if (temp) {
+  throw new ParseException(
+"CREATE TEMPORARY TABLE is not supported yet. " +
+"Please use registerTempTable as an alternative.", ctx)
+}
+if (ctx.skewSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
SKEWED BY ...", ctx)
+}
+if (ctx.bucketSpec != null) {
+  throw new ParseException("Operation not allowed: CREATE TABLE ... 
CLUSTERED BY ...", ctx)
+}
+val tableType = if (external) {
+  CatalogTableType.EXTERNAL_TABLE
 } else {
-  // Get the table header.
-  val (table, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
-  val tableType = if (external) {
-CatalogTableType.EXTERNAL_TABLE
-  } else {
-CatalogTableType.MANAGED_TABLE
-  }
-
-  // Unsupported clauses.
-  if (temp) {
-throw new ParseException(s"Unsupported operation: TEMPORARY 
clause.", ctx)
-  }
-  if (ctx.bucketSpec != null) {
-// TODO add this - we need cluster columns in the CatalogTable for 
this to work.
-throw new ParseException("Unsupported operation: " +
-  "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", 
ctx)
-  }
-  if (ctx.skewSpec != null) {
-throw new ParseException("Operation not allowed: " +
-  "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx)
-  }
-
-  // Create the schema.
-  val schema = 
Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase))
-
-  // Get the column by which the table is partitioned.
-  val partitionCols = 
Option(ctx.partitionColumns).toSeq.flatMap(vis

[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59421753
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -218,14 +219,35 @@ case class CatalogTable(
 tableType: CatalogTableType,
 storage: CatalogStorageFormat,
 schema: Seq[CatalogColumn],
-partitionColumns: Seq[CatalogColumn] = Seq.empty,
-sortColumns: Seq[CatalogColumn] = Seq.empty,
+partitionColumnNames: Seq[String] = Seq.empty,
+sortColumnNames: Seq[String] = Seq.empty,
+bucketColumnNames: Seq[String] = Seq.empty,
+// e.g. (date, country)
+skewColumnNames: Seq[String] = Seq.empty,
+// e.g. ('2008-08-08', 'us), ('2009-09-09', 'uk')
+skewColumnValues: Seq[Seq[String]] = Seq.empty,
 numBuckets: Int = 0,
 createTime: Long = System.currentTimeMillis,
 lastAccessTime: Long = System.currentTimeMillis,
 properties: Map[String, String] = Map.empty,
 viewOriginalText: Option[String] = None,
-viewText: Option[String] = None) {
+viewText: Option[String] = None,
+comment: Option[String] = None) {
+
+  // Verify that the provided columns are part of the schema
+  private val colNames = schema.map(_.name).toSet
+  private def requireSubsetOfSchema(cols: Seq[String], colType: String): 
Unit = {
+require(cols.toSet.subsetOf(colNames), s"$colType columns 
(${cols.mkString(", ")}) " +
+  s"must be a subset of schema (${colNames.mkString(", ")}) in table 
'$identifier'")
+  }
+  requireSubsetOfSchema(partitionColumnNames, "partition")
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59421647
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
--- End diff --

Remove this line while resolving the conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59421840
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala 
---
@@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder {
   }
 
   /**
-   * Create a [[CatalogStorageFormat]]. This is part of the 
[[CreateTableAsSelect]] command.
+   * Create a [[CatalogStorageFormat]] for creating tables.
*/
   override def visitCreateFileFormat(
   ctx: CreateFileFormatContext): CatalogStorageFormat = 
withOrigin(ctx) {
-if (ctx.storageHandler == null) {
-  typedVisit[CatalogStorageFormat](ctx.fileFormat)
-} else {
-  visitStorageHandler(ctx.storageHandler)
+(ctx.fileFormat, ctx.storageHandler) match {
+  // Expected format: INPUTFORMAT input_format OUTPUTFORMAT 
output_format
+  case (c: TableFileFormatContext, null) =>
+  visitTableFileFormat(c)
+  // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET 
| AVRO
+  case (c: GenericFileFormatContext, null) =>
+visitGenericFileFormat(c)
+  case (null, storageHandler) =>
+throw new ParseException("Operation not allowed: ... STORED BY 
storage_handler ...", ctx)
+  case _ =>
+throw new ParseException("expected either STORED AS or STORED BY, 
not both", ctx)
 }
   }
 
   /**
-   * Create a [[CreateTableAsSelect]] command.
+   * Create a table, returning either a [[CreateTable]] or a 
[[CreateTableAsSelect]].
+   *
+   * This is not used to create datasource tables, which is handled through
+   * "CREATE TABLE ... USING ...".
+   *
+   * Note: several features are currently not supported - temporary 
tables, bucketing,
+   * skewed columns and storage handlers (STORED BY).
+   *
+   * Expected format:
+   * {{{
+   *   CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] 
[db_name.]table_name
+   *   [(col1 data_type [COMMENT col_comment], ...)]
+   *   [COMMENT table_comment]
+   *   [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)]
+   *   [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO 
num_buckets BUCKETS]
+   *   [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) 
[STORED AS DIRECTORIES]]
+   *   [ROW FORMAT row_format]
+   *   [STORED AS file_format | STORED BY storage_handler_class [WITH 
SERDEPROPERTIES (...)]]
+   *   [LOCATION path]
+   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   [AS select_statement];
+   * }}}
*/
-  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
-if (ctx.query == null) {
-  HiveNativeCommand(command(ctx))
+  override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val (name, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
+// TODO: implement temporary tables
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12271#discussion_r59421524
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -218,14 +219,35 @@ case class CatalogTable(
 tableType: CatalogTableType,
 storage: CatalogStorageFormat,
 schema: Seq[CatalogColumn],
-partitionColumns: Seq[CatalogColumn] = Seq.empty,
-sortColumns: Seq[CatalogColumn] = Seq.empty,
+partitionColumnNames: Seq[String] = Seq.empty,
+sortColumnNames: Seq[String] = Seq.empty,
+bucketColumnNames: Seq[String] = Seq.empty,
+// e.g. (date, country)
+skewColumnNames: Seq[String] = Seq.empty,
+// e.g. ('2008-08-08', 'us), ('2009-09-09', 'uk')
+skewColumnValues: Seq[Seq[String]] = Seq.empty,
 numBuckets: Int = 0,
 createTime: Long = System.currentTimeMillis,
 lastAccessTime: Long = System.currentTimeMillis,
 properties: Map[String, String] = Map.empty,
 viewOriginalText: Option[String] = None,
-viewText: Option[String] = None) {
+viewText: Option[String] = None,
+comment: Option[String] = None) {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-11 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-208665982
  
I'll have a look at the failing tests in a couple of hours.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-208634664
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-208634671
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-208634574
  
**[Test build #2 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)**
 for PR 12271 at commit 
[`2e95ecf`](https://github.com/apache/spark/commit/2e95ecf790dc5d5b12b6ec72c0bd2b4bca99b17d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12271#issuecomment-208623925
  
**[Test build #2 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)**
 for PR 12271 at commit 
[`2e95ecf`](https://github.com/apache/spark/commit/2e95ecf790dc5d5b12b6ec72c0bd2b4bca99b17d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org