[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE
Github user yhuai closed the pull request at: https://github.com/apache/spark/pull/12363 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12363#issuecomment-209574950 OK, let's close this PR now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209572487 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209572492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55727/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209572170 **[Test build #55727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55727/consoleFull)** for PR 12271 at commit [`55957bd`](https://github.com/apache/spark/commit/55957bdcd1dea06da750c5cf7cfa587da5250574). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209572290 OK. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59593820 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- Yeah, agree if you are talking about my case. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59593113 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- I feel it is better to have more checks in SessionCatalog to make sure that a request is valid (it can help your case). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12363#issuecomment-209558774 **[Test build #55724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55724/consoleFull)** for PR 12363 at commit [`ab70cb7`](https://github.com/apache/spark/commit/ab70cb751cce8ca2e0757b9fa523534207864328). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12363#issuecomment-209559349 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12363#issuecomment-209559352 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55724/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59589025 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- Now, after a few bad experiences, I think we need to add all the test cases to ensure the Hive APIs work as what we expected. See another issue I hit in `alter table ... drop partition`: https://github.com/apache/spark/pull/12220#discussion-diff-59555113 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209536900 **[Test build #55727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55727/consoleFull)** for PR 12271 at commit [`55957bd`](https://github.com/apache/spark/commit/55957bdcd1dea06da750c5cf7cfa587da5250574). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388] [SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12363#issuecomment-209536036 OK, I cherry-picked your changes and updated the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209536129 OK, I cherry-picked your changes and updated the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59580227 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- why are there all these undocumented implicit behaviors in Hive :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59577033 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- I submitted PR to try it out (https://github.com/apache/spark/pull/12363). You can find my change in the last commit (https://github.com/apache/spark/pull/12363/commits/ab70cb751cce8ca2e0757b9fa523534207864328). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59574261 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- For now, if the schema does not have any field, can we set it to the following one? ``` [# col_name data_type comment ] [] [colarray from deserializer ] ``` So, we try to preserve the existing (and weird) behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59573943 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -675,34 +680,38 @@ private[hive] class HiveClientImpl( private def toHiveTable(table: CatalogTable): HiveTable = { val hiveTable = new HiveTable(table.database, table.identifier.table) -// For EXTERNAL_TABLE/MANAGED_TABLE, we also need to set EXTERNAL field in -// the table properties accodringly. Otherwise, if EXTERNAL_TABLE is the table type -// but EXTERNAL field is not set, Hive metastore will change the type to -// MANAGED_TABLE (see -// metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// For EXTERNAL_TABLE, we also need to set EXTERNAL field in the table properties. +// Otherwise, Hive metastore will change the table to a MANAGED_TABLE. +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) hiveTable.setTableType(table.tableType match { case CatalogTableType.EXTERNAL_TABLE => hiveTable.setProperty("EXTERNAL", "TRUE") HiveTableType.EXTERNAL_TABLE case CatalogTableType.MANAGED_TABLE => -hiveTable.setProperty("EXTERNAL", "FALSE") HiveTableType.MANAGED_TABLE case CatalogTableType.INDEX_TABLE => HiveTableType.INDEX_TABLE case CatalogTableType.VIRTUAL_VIEW => HiveTableType.VIRTUAL_VIEW }) -hiveTable.setFields(table.schema.map(toHiveColumn).asJava) -hiveTable.setPartCols(table.partitionColumns.map(toHiveColumn).asJava) +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = table.schema.map(toHiveColumn).partition { c => + table.partitionColumnNames.contains(c.getName) +} +hiveTable.setFields(schema.asJava) +hiveTable.setPartCols(partCols.asJava) // TODO: set sort columns here too +hiveTable.setBucketCols(table.bucketColumnNames.asJava) hiveTable.setOwner(conf.getUser) hiveTable.setNumBuckets(table.numBuckets) hiveTable.setCreateTime((table.createTime / 1000).toInt) hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt) table.storage.locationUri.foreach { loc => shim.setDataLocation(hiveTable, loc) } table.storage.inputFormat.map(toInputFormat).foreach(hiveTable.setInputFormatClass) table.storage.outputFormat.map(toOutputFormat).foreach(hiveTable.setOutputFormatClass) -table.storage.serde.foreach(hiveTable.setSerializationLib) +hiveTable.setSerializationLib( + table.storage.serde.getOrElse("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) --- End diff -- hmm... Without this PR, the storage related fields of the table defined in `org.apache.spark.sql.hive.MetastoreDataSourcesSuite.persistent JSON table` are: ``` [SerDe Library: org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe] [InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat ] [OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat] ``` Looks like the change at here somehow breaks the test (without the change, we can somehow trigger a weird code path in Hive and get at least have one column called `col`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209284615 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209284620 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55696/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209283962 **[Test build #55696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55696/consoleFull)** for PR 12271 at commit [`02738fe`](https://github.com/apache/spark/commit/02738fed54ba0eebc2c8d887430f0bef34213c68). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209283010 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55693/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209283002 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209282220 **[Test build #55693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55693/consoleFull)** for PR 12271 at commit [`a60e66a`](https://github.com/apache/spark/commit/a60e66a71dec96973043ec62e2b6d4213c5add2c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209272015 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209272026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55692/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209270685 **[Test build #55692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55692/consoleFull)** for PR 12271 at commit [`59edce3`](https://github.com/apache/spark/commit/59edce332f87b07bdfb07e2e385431b2b123e1b0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59501698 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -435,13 +480,15 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { /** * Create a sequence of [[CatalogColumn]]s from a column list */ - private def visitCatalogColumns( - ctx: ColTypeListContext, - formatter: String => String = identity): Seq[CatalogColumn] = withOrigin(ctx) { + private def visitCatalogColumns(ctx: ColTypeListContext): Seq[CatalogColumn] = withOrigin(ctx) { ctx.colType.asScala.map { col => CatalogColumn( -formatter(col.identifier.getText), -col.dataType.getText.toLowerCase, // TODO validate this? +col.identifier.getText.toLowerCase, +// Note: for types like "STRUCT" we can't +// just convert the whole type string to lower case, otherwise the struct field names +// will no longer be case sensitive. Instead, we rely on our parser to get the proper +// case before passing it to Hive. +CatalystSqlParser.parseDataType(col.dataType.getText).simpleString, --- End diff -- MINOR/NIT: The DataType parsing is done in the `AstBuilder`, so we really don't need to parse this string again. You could use (magical/evil) `typedVisit[DataType](col.dataType)` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59501466 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -488,7 +488,7 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { // just convert the whole type string to lower case, otherwise the struct field names // will no longer be case sensitive. Instead, we rely on our parser to get the proper // case before passing it to Hive. -HiveMetastoreTypes.toDataType(col.dataType.getText).simpleString, +CatalystSqlParser.parseDataType(col.dataType.getText).simpleString, --- End diff -- MINOR/NIT: The DataType parsing is done in the `AstBuilder`, so we really don't need to parse this string again. You could use (magical/evil) `typedVisit[DataType](col.dataType)` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209254390 **[Test build #55696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55696/consoleFull)** for PR 12271 at commit [`02738fe`](https://github.com/apache/spark/commit/02738fed54ba0eebc2c8d887430f0bef34213c68). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209252514 **[Test build #55693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55693/consoleFull)** for PR 12271 at commit [`a60e66a`](https://github.com/apache/spark/commit/a60e66a71dec96973043ec62e2b6d4213c5add2c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209247087 **[Test build #55692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55692/consoleFull)** for PR 12271 at commit [`59edce3`](https://github.com/apache/spark/commit/59edce332f87b07bdfb07e2e385431b2b123e1b0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59497215 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -435,13 +480,15 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { /** * Create a sequence of [[CatalogColumn]]s from a column list */ - private def visitCatalogColumns( - ctx: ColTypeListContext, - formatter: String => String = identity): Seq[CatalogColumn] = withOrigin(ctx) { + private def visitCatalogColumns(ctx: ColTypeListContext): Seq[CatalogColumn] = withOrigin(ctx) { ctx.colType.asScala.map { col => CatalogColumn( -formatter(col.identifier.getText), -col.dataType.getText.toLowerCase, // TODO validate this? +col.identifier.getText.toLowerCase, +// Note: for types like "STRUCT" we can't +// just convert the whole type string to lower case, otherwise the struct field names +// will no longer be case sensitive. Instead, we rely on our parser to get the proper +// case before passing it to Hive. +HiveMetastoreTypes.toDataType(col.dataType.getText).simpleString, --- End diff -- I think we need to use `parseDataType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59492629 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,115 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ - override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) + override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) { +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables +if (temp) { + throw new ParseException( +"CREATE TEMPORARY TABLE is not supported yet. " + +"Please use registerTempTable as an alternative.", ctx) +} +if (ctx.skewSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... SKEWED BY ...", ctx) +} +if (ctx.bucketSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... CLUSTERED BY ...", ctx) +} +val tableType = if (external) { + CatalogTableType.EXTERNAL_TABLE } else { - // Get the table header. - val (table, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) - val tableType = if (external) { -CatalogTableType.EXTERNAL_TABLE - } else { -CatalogTableType.MANAGED_TABLE - } - - // Unsupported clauses. - if (temp) { -throw new ParseException(s"Unsupported operation: TEMPORARY clause.", ctx) - } - if (ctx.bucketSpec != null) { -// TODO add this - we need cluster columns in the CatalogTable for this to work. -throw new ParseException("Unsupported operation: " + - "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", ctx) - } - if (ctx.skewSpec != null) { -throw new ParseException("Operation not allowed: " + - "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx) - } - - // Create the schema. - val schema = Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase)) - - // Get the column by which the table is partitioned. - val partitionCols = Option(ctx.partitionColumns).toSeq.flatMap(vis
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59492242 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,115 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ - override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) + override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) { +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables +if (temp) { + throw new ParseException( +"CREATE TEMPORARY TABLE is not supported yet. " + +"Please use registerTempTable as an alternative.", ctx) +} +if (ctx.skewSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... SKEWED BY ...", ctx) +} +if (ctx.bucketSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... CLUSTERED BY ...", ctx) +} +val tableType = if (external) { + CatalogTableType.EXTERNAL_TABLE } else { - // Get the table header. - val (table, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) - val tableType = if (external) { -CatalogTableType.EXTERNAL_TABLE - } else { -CatalogTableType.MANAGED_TABLE - } - - // Unsupported clauses. - if (temp) { -throw new ParseException(s"Unsupported operation: TEMPORARY clause.", ctx) - } - if (ctx.bucketSpec != null) { -// TODO add this - we need cluster columns in the CatalogTable for this to work. -throw new ParseException("Unsupported operation: " + - "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", ctx) - } - if (ctx.skewSpec != null) { -throw new ParseException("Operation not allowed: " + - "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx) - } - - // Create the schema. - val schema = Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase)) - - // Get the column by which the table is partitioned. - val partitionCols = Option(ctx.partitionColumns).toSeq.flatMap(vis
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59491556 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -790,14 +821,13 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { "nullscript", "optional_outer", "orc_dictionary_threshold", -"orc_empty_files", "order", "order2", "outer_join_ppr", "parallel", "parenthesis_star_by", "part_inherit_tbl_props", -"part_inherit_tbl_props_empty", +//"part_inherit_tbl_props_empty", // TODO: results don't match --- End diff -- This one and others that are commented out below are caused by differences of describe table. There are mainly three differences. First, for managed tables, we will EXTERNAL set to false in table properties. For tables that are not bucketed, we will have numBuckets set to 0 instead of -1. For tables that does not specify the file format, the output format will be org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat instead of org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat (For Hive, it somehow replaced even if the explain output still shows IgnoreKeyTextOutputFormat). We do not need to block this PR for any of them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59491359 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -490,15 +538,13 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { "count", "cp_mj_rc", "create_insert_outputformat", -"create_like_tbl_props", +//"create_like_tbl_props", // TODO: results don't match --- End diff -- https://issues.apache.org/jira/browse/SPARK-14592 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59491266 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -509,7 +555,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { "date_comparison", "date_join1", "date_serde", -"decimal_1", +//"decimal_1", // TODO: cannot parse column decimal(5) --- End diff -- https://issues.apache.org/jira/browse/SPARK-14591 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59491102 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -509,7 +555,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { "date_comparison", "date_join1", "date_serde", -"decimal_1", +//"decimal_1", // TODO: cannot parse column decimal(5) --- End diff -- Yea. We should support decimal(5). At here, the scale is 0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209194477 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209194478 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55674/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209194377 **[Test build #55674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55674/consoleFull)** for PR 12271 at commit [`8e273fd`](https://github.com/apache/spark/commit/8e273fdc4f95d08cb6d09f4641472861587a3a01). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209172735 **[Test build #55674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55674/consoleFull)** for PR 12271 at commit [`8e273fd`](https://github.com/apache/spark/commit/8e273fdc4f95d08cb6d09f4641472861587a3a01). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209172015 I've ignored some tests in `HiveCompatibilitySuite` for now. I'll have a look at them shortly. I just wanted to see if the rest of the tests will pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209169942 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209169944 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55666/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209169833 **[Test build #55666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55666/consoleFull)** for PR 12271 at commit [`8dc554a`](https://github.com/apache/spark/commit/8dc554a38c9989fc43b119645bfe5c8ceb7b6cdb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209158268 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55658/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209158146 **[Test build #55658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55658/consoleFull)** for PR 12271 at commit [`50a2054`](https://github.com/apache/spark/commit/50a2054ec7a7276d45c1ab5adabd4550e00c7811). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209158266 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209153212 **[Test build #55666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55666/consoleFull)** for PR 12271 at commit [`8dc554a`](https://github.com/apache/spark/commit/8dc554a38c9989fc43b119645bfe5c8ceb7b6cdb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209135892 **[Test build #55658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55658/consoleFull)** for PR 12271 at commit [`50a2054`](https://github.com/apache/spark/commit/50a2054ec7a7276d45c1ab5adabd4550e00c7811). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209135567 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209135570 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55654/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209128481 **[Test build #55654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55654/consoleFull)** for PR 12271 at commit [`efecac9`](https://github.com/apache/spark/commit/efecac9b01b3ff8be296234392f4a6c922fa2d25). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209115335 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55644/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209115331 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209115138 **[Test build #55644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55644/consoleFull)** for PR 12271 at commit [`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209092173 **[Test build #55644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55644/consoleFull)** for PR 12271 at commit [`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209091270 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209087621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55634/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209087417 **[Test build #55634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55634/consoleFull)** for PR 12271 at commit [`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209087617 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209086376 LGTM - pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59445656 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ - override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) + override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) { +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables +if (temp) { + throw new ParseException( +"CREATE TEMPORARY TABLE is not supported yet. " + +"Please use registerTempTable as an alternative.", ctx) +} +if (ctx.skewSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... SKEWED BY ...", ctx) +} +if (ctx.bucketSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... CLUSTERED BY ...", ctx) +} +val tableType = if (external) { + CatalogTableType.EXTERNAL_TABLE } else { - // Get the table header. - val (table, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) - val tableType = if (external) { -CatalogTableType.EXTERNAL_TABLE - } else { -CatalogTableType.MANAGED_TABLE - } - - // Unsupported clauses. - if (temp) { -throw new ParseException(s"Unsupported operation: TEMPORARY clause.", ctx) - } - if (ctx.bucketSpec != null) { -// TODO add this - we need cluster columns in the CatalogTable for this to work. -throw new ParseException("Unsupported operation: " + - "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", ctx) - } - if (ctx.skewSpec != null) { -throw new ParseException("Operation not allowed: " + - "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx) - } - - // Create the schema. - val schema = Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase)) - - // Get the column by which the table is partitioned. - val partitionCols = Option(ctx.partitionColumns).toSeq.flatMa
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209058443 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209058447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55628/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209061013 **[Test build #55634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55634/consoleFull)** for PR 12271 at commit [`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209059940 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209058214 **[Test build #55628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55628/consoleFull)** for PR 12271 at commit [`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59426592 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -134,82 +136,117 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + case (fileFormat, null) if fileFormat != null => +fileFormat match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case c: TableFileFormatContext => visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case c: GenericFileFormatContext => visitGenericFileFormat(c) +} + case (null, storageHandler) if storageHandler != null => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case (null, null) => +throw new ParseException("expected one of STORED AS or STORED BY", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table. TODO: expand this comment! + * + * For example: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) + * [STORED AS DIRECTORIES] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables +if (temp) { + throw new AnalysisException( --- End diff -- (I actually fixed this in this patch, but we still need to fix it in all the other places in SPARK-14414) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209032675 **[Test build #55628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55628/consoleFull)** for PR 12271 at commit [`250f402`](https://github.com/apache/spark/commit/250f402372e9826865749f3b81cd96a7cdaff657). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209031556 Thanks for the reviews. Latest commit should pass tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59423143 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ - override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) + override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) { +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables +if (temp) { + throw new ParseException( +"CREATE TEMPORARY TABLE is not supported yet. " + +"Please use registerTempTable as an alternative.", ctx) +} +if (ctx.skewSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... SKEWED BY ...", ctx) +} +if (ctx.bucketSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... CLUSTERED BY ...", ctx) +} +val tableType = if (external) { + CatalogTableType.EXTERNAL_TABLE } else { - // Get the table header. - val (table, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) - val tableType = if (external) { -CatalogTableType.EXTERNAL_TABLE - } else { -CatalogTableType.MANAGED_TABLE - } - - // Unsupported clauses. - if (temp) { -throw new ParseException(s"Unsupported operation: TEMPORARY clause.", ctx) - } - if (ctx.bucketSpec != null) { -// TODO add this - we need cluster columns in the CatalogTable for this to work. -throw new ParseException("Unsupported operation: " + - "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", ctx) - } - if (ctx.skewSpec != null) { -throw new ParseException("Operation not allowed: " + - "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx) - } - - // Create the schema. - val schema = Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase)) - - // Get the column by which the table is partitioned. - val partitionCols = Option(ctx.partitionColumns).toSeq.flatMa
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-209030454 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59421983 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ - override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) + override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) { +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables +if (temp) { + throw new ParseException( +"CREATE TEMPORARY TABLE is not supported yet. " + +"Please use registerTempTable as an alternative.", ctx) +} +if (ctx.skewSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... SKEWED BY ...", ctx) +} +if (ctx.bucketSpec != null) { + throw new ParseException("Operation not allowed: CREATE TABLE ... CLUSTERED BY ...", ctx) +} +val tableType = if (external) { + CatalogTableType.EXTERNAL_TABLE } else { - // Get the table header. - val (table, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) - val tableType = if (external) { -CatalogTableType.EXTERNAL_TABLE - } else { -CatalogTableType.MANAGED_TABLE - } - - // Unsupported clauses. - if (temp) { -throw new ParseException(s"Unsupported operation: TEMPORARY clause.", ctx) - } - if (ctx.bucketSpec != null) { -// TODO add this - we need cluster columns in the CatalogTable for this to work. -throw new ParseException("Unsupported operation: " + - "CLUSTERED BY ... [ORDERED BY ...] INTO ... BUCKETS clause.", ctx) - } - if (ctx.skewSpec != null) { -throw new ParseException("Operation not allowed: " + - "SKEWED BY ... ON ... [STORED AS DIRECTORIES] clause.", ctx) - } - - // Create the schema. - val schema = Option(ctx.columns).toSeq.flatMap(visitCatalogColumns(_, _.toLowerCase)) - - // Get the column by which the table is partitioned. - val partitionCols = Option(ctx.partitionColumns).toSeq.flatMap(vis
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59421753 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -218,14 +219,35 @@ case class CatalogTable( tableType: CatalogTableType, storage: CatalogStorageFormat, schema: Seq[CatalogColumn], -partitionColumns: Seq[CatalogColumn] = Seq.empty, -sortColumns: Seq[CatalogColumn] = Seq.empty, +partitionColumnNames: Seq[String] = Seq.empty, +sortColumnNames: Seq[String] = Seq.empty, +bucketColumnNames: Seq[String] = Seq.empty, +// e.g. (date, country) +skewColumnNames: Seq[String] = Seq.empty, +// e.g. ('2008-08-08', 'us), ('2009-09-09', 'uk') +skewColumnValues: Seq[Seq[String]] = Seq.empty, numBuckets: Int = 0, createTime: Long = System.currentTimeMillis, lastAccessTime: Long = System.currentTimeMillis, properties: Map[String, String] = Map.empty, viewOriginalText: Option[String] = None, -viewText: Option[String] = None) { +viewText: Option[String] = None, +comment: Option[String] = None) { + + // Verify that the provided columns are part of the schema + private val colNames = schema.map(_.name).toSet + private def requireSubsetOfSchema(cols: Seq[String], colType: String): Unit = { +require(cols.toSet.subsetOf(colNames), s"$colType columns (${cols.mkString(", ")}) " + + s"must be a subset of schema (${colNames.mkString(", ")}) in table '$identifier'") + } + requireSubsetOfSchema(partitionColumnNames, "partition") --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59421647 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] --- End diff -- Remove this line while resolving the conflicts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59421840 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -121,84 +123,114 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** - * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + * Create a [[CatalogStorageFormat]] for creating tables. */ override def visitCreateFileFormat( ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { -if (ctx.storageHandler == null) { - typedVisit[CatalogStorageFormat](ctx.fileFormat) -} else { - visitStorageHandler(ctx.storageHandler) +(ctx.fileFormat, ctx.storageHandler) match { + // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format + case (c: TableFileFormatContext, null) => + visitTableFileFormat(c) + // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO + case (c: GenericFileFormatContext, null) => +visitGenericFileFormat(c) + case (null, storageHandler) => +throw new ParseException("Operation not allowed: ... STORED BY storage_handler ...", ctx) + case _ => +throw new ParseException("expected either STORED AS or STORED BY, not both", ctx) } } /** - * Create a [[CreateTableAsSelect]] command. + * Create a table, returning either a [[CreateTable]] or a [[CreateTableAsSelect]]. + * + * This is not used to create datasource tables, which is handled through + * "CREATE TABLE ... USING ...". + * + * Note: several features are currently not supported - temporary tables, bucketing, + * skewed columns and storage handlers (STORED BY). + * + * Expected format: + * {{{ + * CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * [(col1 data_type [COMMENT col_comment], ...)] + * [COMMENT table_comment] + * [PARTITIONED BY (col3 data_type [COMMENT col_comment], ...)] + * [CLUSTERED BY (col1, ...) [SORTED BY (col1 [ASC|DESC], ...)] INTO num_buckets BUCKETS] + * [SKEWED BY (col1, col2, ...) ON ((col_value, col_value, ...), ...) [STORED AS DIRECTORIES]] + * [ROW FORMAT row_format] + * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]] + * [LOCATION path] + * [TBLPROPERTIES (property_name=property_value, ...)] + * [AS select_statement]; + * }}} */ - override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { -if (ctx.query == null) { - HiveNativeCommand(command(ctx)) + override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) { +val (name, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) +// TODO: implement temporary tables --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12271#discussion_r59421524 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -218,14 +219,35 @@ case class CatalogTable( tableType: CatalogTableType, storage: CatalogStorageFormat, schema: Seq[CatalogColumn], -partitionColumns: Seq[CatalogColumn] = Seq.empty, -sortColumns: Seq[CatalogColumn] = Seq.empty, +partitionColumnNames: Seq[String] = Seq.empty, +sortColumnNames: Seq[String] = Seq.empty, +bucketColumnNames: Seq[String] = Seq.empty, +// e.g. (date, country) +skewColumnNames: Seq[String] = Seq.empty, +// e.g. ('2008-08-08', 'us), ('2009-09-09', 'uk') +skewColumnValues: Seq[Seq[String]] = Seq.empty, numBuckets: Int = 0, createTime: Long = System.currentTimeMillis, lastAccessTime: Long = System.currentTimeMillis, properties: Map[String, String] = Map.empty, viewOriginalText: Option[String] = None, -viewText: Option[String] = None) { +viewText: Option[String] = None, +comment: Option[String] = None) { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-208665982 I'll have a look at the failing tests in a couple of hours. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-208634664 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-208634671 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-208634574 **[Test build #2 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)** for PR 12271 at commit [`2e95ecf`](https://github.com/apache/spark/commit/2e95ecf790dc5d5b12b6ec72c0bd2b4bca99b17d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14388][SQL] Implement CREATE TABLE
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12271#issuecomment-208623925 **[Test build #2 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)** for PR 12271 at commit [`2e95ecf`](https://github.com/apache/spark/commit/2e95ecf790dc5d5b12b6ec72c0bd2b4bca99b17d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org