[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14809 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77578688 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -272,17 +282,11 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " (None, message) -case (Some(serde), Some(path)) => +case Some(serde) => val message = -s"Persisting file based data source table $qualifiedTableName with an input path " + - s"into Hive metastore in Hive compatible format." - (Some(newHiveCompatibleMetastoreTable(serde, path)), message) - -case (Some(_), None) => - val message = -s"Data source table $qualifiedTableName is not file based. Persisting it into " + - s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." - (None, message) +s"Persisting file based data source table $qualifiedTableName into " + --- End diff -- only external table has path, and the previous code doesn't log the path either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77560966 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } // converts the table metadata to Hive compatible format, i.e. set the serde information. - def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { + def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable = { +val location = if (tableDefinition.tableType == EXTERNAL) { + // When we hit this branch, we are saving an external data source table with hive + // compatible format, which means the data source is file-based and must have a `path`. + val map = new CaseInsensitiveMap(tableDefinition.storage.properties) + assert(map.contains("path"), --- End diff -- uh... good to know that. @clockfly Are you saying `-Xdisable-assertions` is used in the official Spark binary build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77295778 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } // converts the table metadata to Hive compatible format, i.e. set the serde information. - def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { + def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable = { +val location = if (tableDefinition.tableType == EXTERNAL) { + // When we hit this branch, we are saving an external data source table with hive + // compatible format, which means the data source is file-based and must have a `path`. + val map = new CaseInsensitiveMap(tableDefinition.storage.properties) + assert(map.contains("path"), --- End diff -- how about require? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77294000 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -272,17 +282,11 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " (None, message) -case (Some(serde), Some(path)) => +case Some(serde) => val message = -s"Persisting file based data source table $qualifiedTableName with an input path " + - s"into Hive metastore in Hive compatible format." - (Some(newHiveCompatibleMetastoreTable(serde, path)), message) - -case (Some(_), None) => - val message = -s"Data source table $qualifiedTableName is not file based. Persisting it into " + - s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." - (None, message) +s"Persisting file based data source table $qualifiedTableName into " + --- End diff -- I think it is good to have the path in the log message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77293800 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } // converts the table metadata to Hive compatible format, i.e. set the serde information. - def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { + def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable = { +val location = if (tableDefinition.tableType == EXTERNAL) { + // When we hit this branch, we are saving an external data source table with hive + // compatible format, which means the data source is file-based and must have a `path`. + val map = new CaseInsensitiveMap(tableDefinition.storage.properties) + assert(map.contains("path"), --- End diff -- assert will be removed during runtime, can we use explicit exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/14809 [SPARK-17238][SQL] simplify the logic for converting data source table into hive compatible format ## What changes were proposed in this pull request? Previously we have 2 conditions to decide whether a data source table is hive-compatible: 1. the data source is file-based and has a corresponding Hive serde 2. have a `path` entry in data source options/storage properties However, if condition 1 is true, condition 2 must be true too, as we will put the default table path into data source options/storage properties for managed data source tables. There is also a potential issue: we will set the `locationUri` even for managed table. This PR removes the condition 2 and only set the `locationUri` for external data source tables. Note: this is also a first step to unify the `path` of data source tables and `locationUri` of hive serde tables. For hive serde tables, `locationUri` is only set for external table. For data source tables, `path` is always set. We can make them consistent after this PR. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14809.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14809 commit 915d2b5a1dd8c26a37d0b99ba0503a0d95b6f3f3 Author: Wenchen FanDate: 2016-08-25T15:11:23Z simplify the logic for converting data source table into hive compatible format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org