[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17001 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105565416 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location --- End diff -- `default` -> `The default` `use warehouse path` -> `uses the warehouse path` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105565378 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(new Path(table.location) == fs.makeQualified(new Path(warehousePath, "t"))) + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(new Path(table1.location) == fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // the table in default database created in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. --- End diff -- -> ``` For the table created by another job in the default database, the location of this table is not changed, even if the current job has different warehouse path. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105565256 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(new Path(table.location) == fs.makeQualified(new Path(warehousePath, "t"))) + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(new Path(table1.location) == fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // the table in default database created in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table.location) != fs.makeQualified(new Path(warehousePath, "t"))) + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE TABLE t3(d string)") + val table3 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) + // the table in default database created here in this job, it will use the warehouse path + // of this job as its location --- End diff -- -> ``` When a job creates a table in the default database, the table location is under the warehouse path that is configured for the local job. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105565132 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(new Path(table.location) == fs.makeQualified(new Path(warehousePath, "t"))) + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(new Path(table1.location) == fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // the table in default database created in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table.location) != fs.makeQualified(new Path(warehousePath, "t"))) + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE TABLE t3(d string)") + val table3 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) + // the table in default database created here in this job, it will use the warehouse path + // of this job as its location + assert(new Path(table3.location) == fs.makeQualified(new Path(warehousePath, "t3"))) + + spark.sql("USE not_default") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // the table in not default database create in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table1.location) != fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) + assert(!new File(warehousePath.toString, "not_default.db/t1").exists()) --- End diff -- This scenario (line 993-1000) is not needed to test, IMO. Most of the test cases already cover it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105565041 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(new Path(table.location) == fs.makeQualified(new Path(warehousePath, "t"))) + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(new Path(table1.location) == fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // the table in default database created in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table.location) != fs.makeQualified(new Path(warehousePath, "t"))) + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE TABLE t3(d string)") + val table3 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) + // the table in default database created here in this job, it will use the warehouse path + // of this job as its location + assert(new Path(table3.location) == fs.makeQualified(new Path(warehousePath, "t3"))) + + spark.sql("USE not_default") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // the table in not default database create in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table1.location) != fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) + assert(!new File(warehousePath.toString, "not_default.db/t1").exists()) + + spark.sql("CREATE TABLE t2(c string)") + val table2 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t2")) + // the table in not default database created here in this job, it will use the location + // of the database as its location, not the warehouse path in this job --- End diff -- -> ``` the table created in the non-default database (created in another job) is under the database location. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105564983 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(new Path(table.location) == fs.makeQualified(new Path(warehousePath, "t"))) + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(new Path(table1.location) == fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // the table in default database created in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table.location) != fs.makeQualified(new Path(warehousePath, "t"))) + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE TABLE t3(d string)") + val table3 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) + // the table in default database created here in this job, it will use the warehouse path + // of this job as its location + assert(new Path(table3.location) == fs.makeQualified(new Path(warehousePath, "t3"))) + + spark.sql("USE not_default") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // the table in not default database create in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table1.location) != fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) + assert(!new File(warehousePath.toString, "not_default.db/t1").exists()) + + spark.sql("CREATE TABLE t2(c string)") + val table2 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t2")) + // the table in not default database created here in this job, it will use the location + // of the database as its location, not the warehouse path in this job + assert(new Path(table2.location) != fs.makeQualified( +new Path(warehousePath, "not_default.db/t2"))) + + spark.sql("CREATE DATABASE not_default_1") --- End diff -- -> `non_default_db1` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@sp
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r105564912 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -907,3 +936,97 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(new Path(table.location) == fs.makeQualified(new Path(warehousePath, "t"))) + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(new Path(table1.location) == fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = new Path(spark.sharedState.warehousePath) + val fs = warehousePath.getFileSystem(spark.sessionState.newHadoopConf()) + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(new Path(defaultDB.locationUri) == fs.makeQualified(warehousePath)) + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // the table in default database created in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table.location) != fs.makeQualified(new Path(warehousePath, "t"))) + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE TABLE t3(d string)") + val table3 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) + // the table in default database created here in this job, it will use the warehouse path + // of this job as its location + assert(new Path(table3.location) == fs.makeQualified(new Path(warehousePath, "t3"))) + + spark.sql("USE not_default") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // the table in not default database create in job(SPARK_19667_CREATE_TABLE) above, + // which has different warehouse path from this job, its location still equals to + // the location when it's created. + assert(new Path(table1.location) != fs.makeQualified( +new Path(warehousePath, "not_default.db/t1"))) + assert(!new File(warehousePath.toString, "not_default.db/t1").exists()) + + spark.sql("CREATE TABLE t2(c string)") + val table2 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t2")) + // the table in not default database created here in this job, it will use the location + // of the database as its location, not the warehouse path in this job + assert(new Path(table2.location) != fs.makeQualified( +new Path(warehousePath, "not_default.db/t2"))) + + spark.sql("CREATE DATABASE not_default_1") + spark.sql("USE not_default_1") + spark.sql("CREATE TABLE t4(e string)") + val table4 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t4")) + // the table created in the database which created in this job, it will use the location + // of the database. --- End diff -- -> ``` the table created in the non-default database is under the database location. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub a
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r104111279 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -74,7 +88,17 @@ abstract class ExternalCatalog { */ def alterDatabase(dbDefinition: CatalogDatabase): Unit - def getDatabase(db: String): CatalogDatabase + final def getDatabase(db: String): CatalogDatabase = { +val database = getDatabaseInternal(db) --- End diff -- put this in the else branch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r104101806 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -905,3 +934,91 @@ object SPARK_18989_DESC_TABLE { } } } + +object SPARK_19667_CREATE_TABLE { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = s"file:${spark.sharedState.warehousePath.stripSuffix("/")}" + val defaultDB = spark.sessionState.catalog.getDatabaseMetadata("default") + // default database use warehouse path as its location + assert(defaultDB.locationUri.stripSuffix("/") == warehousePath) + spark.sql("CREATE TABLE t(a string)") + + val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) + // table in default database use the location of default database which is also warehouse path + assert(table.location.stripSuffix("/") == s"$warehousePath/t") + spark.sql("INSERT INTO TABLE t SELECT 1") + assert(spark.sql("SELECT * FROM t").count == 1) + + spark.sql("CREATE DATABASE not_default") + spark.sql("USE not_default") + spark.sql("CREATE TABLE t1(b string)") + val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) + // table in not default database use the location of its own database + assert(table1.location.stripSuffix("/") == s"$warehousePath/not_default.db/t1") +} finally { + spark.sql("USE default") +} + } +} + +object SPARK_19667_VERIFY_TABLE_PATH { + def main(args: Array[String]): Unit = { +val spark = SparkSession.builder().enableHiveSupport().getOrCreate() +try { + val warehousePath = s"file:${spark.sharedState.warehousePath.stripSuffix("/")}" --- End diff -- I am doing this modify --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103891244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- I have modify the code by adding ``` lazy val defaultDB = { val qualifiedWarehousePath = SessionCatalog .makeQualifiedPath(warehousePath, hadoopConf).toString CatalogDatabase("default","", qualifiedWarehousePath, Map.empty) } ``` in `ExternalCatalog ` if it is not ok ,I will revert it, thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103889802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- @cloud-fan I found it that if we add a parameter `defaultDB` for `ExternalCatalog` and its subclass `InMemoryCatalog` and `HiveExternalCatalog`, this change will cause a lot of related code to be modified, such as test cases ,and other logic which create `InMemoryCatalog` and `HiveExternalCatalog`, for example: currently all the parameters of `InMemoryCatalog` have its own default value `class InMemoryCatalog(conf: SparkConf = new SparkConf,hadoopConfig: Configuration = new Configuration)`,we can create it without an parameters, but if we add a `defaultDB`, we should new a defaultDB in the parameter, while we can not create a legal deafultDB because we can not get the warehouse path for the defaultDB like this: ` `class InMemoryCatalog(conf: SparkConf = new SparkConf,hadoopConfig: Configuration = new Configuration, defaultDB: CatalogDatabase = CatalogDatabase("default","","${can not get the warehouse path}",Map.empty))`` if we don't provide a default value for defautDB in the parameter, this will cause more code change which I think it is not proper. what about we keep the `provided def warehousePath` in `ExternalCatalog`, and add a `lazy val defaultDB = { val qualifiedWarehousePath = SessionCatalog .makeQualifiedPath(warehousePath, hadoopConf).toString CatalogDatabase("default","", qualifiedWarehousePath, Map.empty) }` this can also avoid call getDatabase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103865522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- ok~ let me fix it~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103865312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- but it will be only used in `getDatabase`, and we can save a metastore call to get the default database. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103863975 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- if we pass a defaultDB, it seems like we introduce an instance of defaultDB as we discussed above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103863070 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- we still have conf/hadoopConf in `InMemoryCatalog` and `HiveExternalCatalog`, we can just add one more parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103862862 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- I think conf/hadoopConf is more useful, later logic can use it. and it's subclass also has these two conf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103861521 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- how about we just pass in a `defaultDB: CatalogDatabase`? then we don't need to add the `protected def warehousePath: String` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103861355 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -74,7 +77,19 @@ abstract class ExternalCatalog { */ def alterDatabase(dbDefinition: CatalogDatabase): Unit - def getDatabase(db: String): CatalogDatabase + def getDatabase(db: String): CatalogDatabase = { +val database = getDatabaseInternal(db) +// The default database's location always uses the warehouse path. +// Since the location of database stored in metastore is qualified, +// we also make the warehouse location qualified. +if (db == SessionCatalog.DEFAULT_DATABASE) { --- End diff -- makes sense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103605667 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -74,7 +77,19 @@ abstract class ExternalCatalog { */ def alterDatabase(dbDefinition: CatalogDatabase): Unit - def getDatabase(db: String): CatalogDatabase + def getDatabase(db: String): CatalogDatabase = { +val database = getDatabaseInternal(db) +// The default database's location always uses the warehouse path. +// Since the location of database stored in metastore is qualified, +// we also make the warehouse location qualified. +if (db == SessionCatalog.DEFAULT_DATABASE) { --- End diff -- I think if we create an default instance of `CatalogDatabase`, we will add more logic to other interfaces, such as `databaseExists` ,we need to add the logic that if it is default database, we always return true even if there is no default database in hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103420821 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -74,7 +77,19 @@ abstract class ExternalCatalog { */ def alterDatabase(dbDefinition: CatalogDatabase): Unit - def getDatabase(db: String): CatalogDatabase + def getDatabase(db: String): CatalogDatabase = { +val database = getDatabaseInternal(db) +// The default database's location always uses the warehouse path. +// Since the location of database stored in metastore is qualified, +// we also make the warehouse location qualified. +if (db == SessionCatalog.DEFAULT_DATABASE) { --- End diff -- can we totally make default database a virtual concept? i.e. we never create it or ask metastore to retrieve it, just keep an instance of `CatalogDatabase` for default database, and return it in `ExternalCatalog.getDatabase` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103419380 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -74,7 +77,19 @@ abstract class ExternalCatalog { */ def alterDatabase(dbDefinition: CatalogDatabase): Unit - def getDatabase(db: String): CatalogDatabase + def getDatabase(db: String): CatalogDatabase = { --- End diff -- final def --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103067748 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri --- End diff -- I'd like the first one, `defaultTablePath` in `SessionCatalog` has been used in `CreateDataSourceAsSelectCommand`, I think it is useful to keep it in `SessionCatalog`, maybe later in another place we want to use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103046202 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri --- End diff -- or let's just move this `defaultTablePath` function to `ExternalCatalog`, and we use it in `InMemoryCatalog` and `HiveExternalCatalog` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103045973 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri --- End diff -- how about we put this logic in `ExternalCatalog`? e.g. `ExternalCatalog.getDatabase` will check the default database and then call `getDatabaseInternal` which need to be implemented by subclasses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102873675 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- Please forgive me~ It seemly that I understand your descirbe is for confirm what issue this change is going to resolve . Yes, it is to resolve the issue which you described~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102871210 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- o(â¯â¡â°)o --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102870703 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- The issue was already explained above. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102870643 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location --- End diff -- ok, it is better, thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102870494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.warehousePath, hadoopConf).toString +} else database.locationUri --- End diff -- Yes. I knew it. We should create another function and rename the existing one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102870072 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- @gatorsmile could you desc it clear for what the issue is? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102866593 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- yes, it is duplicate, I have tried to merge them by unify the `defaultTablePath for managed table` in [HiveExternalCatalog](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L205-L216) and [InMemoryCatalog](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala#L194-L202 ) up to [SessionCatalog](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L253) then, we can drop the `defaultTablePath` in `HiveExternalCatalog`. But this cause some test faile in `InMemoryCatalogSuit`, we should modify the tests. So I think maybe we can keep the already existed duplication function `defaultTablePath`, and to merge them and fix some test failed in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102865566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.warehousePath, hadoopConf).toString +} else database.locationUri --- End diff -- `getDefaultDBPath` used when we create a database without providing a location. it is different from the logic here, here is to get the default table path in a existed database --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102784103 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- Or call `SessionCatalog.getDatabaseMetadata`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102783506 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- Actually, this change fixes an existing issue, if my understanding is correct If the default database has already been created in the metastore (which could be shared), any following changes of `spark.sql.default.warehouse.dir` can trigger an issue when we create a managed table in the default database (Here, we assume Hive support is enabled). [The table path will be based on this function](https://github.com/windpiger/spark/blob/8f8063f8c4e11cb4f6158c90d21d8a0985d38584/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L209-L216). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102781147 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- why we need to do this here? when we wanna get the database metadata, we should call `SessionCatalog.getDatabase` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102778675 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -408,7 +408,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def defaultTablePath(tableIdent: TableIdentifier): String = { -val dbLocation = getDatabase(tableIdent.database.get).locationUri +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (tableIdent.database.orNull == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.get(WAREHOUSE_PATH), hadoopConf).toString +} else getDatabase(tableIdent.database.get).locationUri --- End diff -- This is duplicate code. We should call the same one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102778417 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.warehousePath, hadoopConf).toString +} else database.locationUri --- End diff -- Do we also need to qualify the default location for the other database in `getDefaultDBPath`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102778093 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.warehousePath, hadoopConf).toString +} else database.locationUri --- End diff -- Actually, maybe we can write a separate function for getting the location of default database. Just put it below the function [`getDefaultDBPath`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L227-L234). We might need to rename `getDefaultDBPath`. The name looks confusing now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102777281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location --- End diff -- How about > // The default database's location always uses the warehouse path. // Since the location of database stored in metastore is qualified, // we also make the warehouse location qualified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102776886 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -193,7 +193,16 @@ class SessionCatalog( def getDatabaseMetadata(db: String): CatalogDatabase = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -externalCatalog.getDatabase(dbName) +val database = externalCatalog.getDatabase(dbName) + +// default database's location always use the warehouse path, +// and since the location of database stored in metastore is qualified, +// here we also make qualify for warehouse location +val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { + SessionCatalog.makeQualifiedPath(conf.warehousePath, hadoopConf).toString +} else database.locationUri --- End diff -- Please fix the style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102650536 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -339,10 +340,17 @@ private[hive] class HiveClientImpl( override def getDatabase(dbName: String): CatalogDatabase = withHiveState { Option(client.getDatabase(dbName)).map { d => + // default database's location always use the warehouse path, + // and since the location of database stored in metastore is qualified, + // here we also make qualify for warehouse location + val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { +SessionCatalog.makeQualifiedPath(sparkConf.get(WAREHOUSE_PATH), hadoopConf).toString --- End diff -- Agreed! ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102648988 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -339,10 +340,17 @@ private[hive] class HiveClientImpl( override def getDatabase(dbName: String): CatalogDatabase = withHiveState { Option(client.getDatabase(dbName)).map { d => + // default database's location always use the warehouse path, + // and since the location of database stored in metastore is qualified, + // here we also make qualify for warehouse location + val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { +SessionCatalog.makeQualifiedPath(sparkConf.get(WAREHOUSE_PATH), hadoopConf).toString --- End diff -- What I want is consistency. Now we decide to define the location of default database as warehouse path, and we should stick with it. The main goal of this PR is not to fix the bug when sharing metastore db, but to change the definition of default database location. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102642601 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -339,10 +340,17 @@ private[hive] class HiveClientImpl( override def getDatabase(dbName: String): CatalogDatabase = withHiveState { Option(client.getDatabase(dbName)).map { d => + // default database's location always use the warehouse path, + // and since the location of database stored in metastore is qualified, + // here we also make qualify for warehouse location + val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { +SessionCatalog.makeQualifiedPath(sparkConf.get(WAREHOUSE_PATH), hadoopConf).toString --- End diff -- This won't work for `InMemoryCatalog`, isn't it? You should either implement this logic in all `ExternalCatalog`s, all put it in `SessionCatalog` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102625694 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -339,10 +340,15 @@ private[hive] class HiveClientImpl( override def getDatabase(dbName: String): CatalogDatabase = withHiveState { Option(client.getDatabase(dbName)).map { d => + // default database's location always use the warehouse path + val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { --- End diff -- sorry, I don't get the point, if this logic is exactly what we expected, we'd better to replace it at the beginning? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102622702 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala --- @@ -92,7 +92,8 @@ abstract class BucketedWriteSuite extends QueryTest with SQLTestUtils { def tableDir: File = { val identifier = spark.sessionState.sqlParser.parseTableIdentifier("bucketed_table") -new File(URI.create(spark.sessionState.catalog.defaultTablePath(identifier))) +new File(URI.create(s"file:${spark.sessionState.catalog.defaultTablePath(identifier) --- End diff -- ok, let me try~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102619806 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala --- @@ -92,7 +92,8 @@ abstract class BucketedWriteSuite extends QueryTest with SQLTestUtils { def tableDir: File = { val identifier = spark.sessionState.sqlParser.parseTableIdentifier("bucketed_table") -new File(URI.create(spark.sessionState.catalog.defaultTablePath(identifier))) +new File(URI.create(s"file:${spark.sessionState.catalog.defaultTablePath(identifier) --- End diff -- shall we fix `SessionCatalog.defaultTablePath`? to make it return a URI mabe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102619596 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -339,10 +340,15 @@ private[hive] class HiveClientImpl( override def getDatabase(dbName: String): CatalogDatabase = withHiveState { Option(client.getDatabase(dbName)).map { d => + // default database's location always use the warehouse path + val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) { --- End diff -- is it more logical to put this logic in `SessionCatalog`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102619396 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala --- @@ -92,7 +92,8 @@ abstract class BucketedWriteSuite extends QueryTest with SQLTestUtils { def tableDir: File = { val identifier = spark.sessionState.sqlParser.parseTableIdentifier("bucketed_table") -new File(URI.create(spark.sessionState.catalog.defaultTablePath(identifier))) +new File(URI.create(s"file:${spark.sessionState.catalog.defaultTablePath(identifier) + .stripPrefix("file:")}")) --- End diff -- ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102392451 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -311,11 +313,12 @@ private[hive] class HiveClientImpl( override def createDatabase( database: CatalogDatabase, ignoreIfExists: Boolean): Unit = withHiveState { +// default database's location always use the warehouse path, here set to emtpy string client.createDatabase( new HiveDatabase( database.name, database.description, -database.locationUri, +if (database.name == SessionCatalog.DEFAULT_DATABASE) "" else database.locationUri, --- End diff -- sorry, actually it will throw an exception, my local default has created, so it does not hit the exception, I will just replace the default database location when reload from metastore, drop the logic when create database set location to empty string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r102387112 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -311,11 +313,12 @@ private[hive] class HiveClientImpl( override def createDatabase( database: CatalogDatabase, ignoreIfExists: Boolean): Unit = withHiveState { +// default database's location always use the warehouse path, here set to emtpy string client.createDatabase( new HiveDatabase( database.name, database.description, -database.locationUri, +if (database.name == SessionCatalog.DEFAULT_DATABASE) "" else database.locationUri, --- End diff -- If it is empty, metastore will set it for us, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org