[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1028966247 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -105,6 +106,15 @@ private[hive] class HiveClientImpl( private class RawHiveTableImpl(override val rawTable: HiveTable) extends RawHiveTable { override lazy val toCatalogTable = convertHiveTableToCatalogTable(rawTable) + +override def hiveTableProps(containsStats: Boolean): Map[String, String] = { Review Comment: why do we need this parameter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1028755248 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala: ## @@ -113,6 +113,9 @@ private[hive] trait HiveClient { /** Creates a table with the given metadata. */ def createTable(table: CatalogTable, ignoreIfExists: Boolean): Unit + /** Get hive table properties. */ + def hiveTableProps(rawHiveTable: RawHiveTable, containsStats: Boolean): Map[String, String] Review Comment: shall we add a method in `RawHiveTable` to do it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1027963889 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -721,19 +721,16 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat table: String, stats: Option[CatalogStatistics]): Unit = withClient { requireTableExists(db, table) -val rawTable = getRawTable(db, table) - -// convert table statistics to properties so that we can persist them through hive client -val statsProperties = +val rawHiveTable = client.getRawHiveTable(db, table) +val oldProps = client.hiveTableProps(rawHiveTable) Review Comment: can you explain the rationale? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1027656667 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala: ## @@ -894,12 +895,14 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter sql(insertString.toLowerCase(Locale.ROOT)) sql(insertString.toUpperCase(Locale.ROOT)) + spark.sessionState.catalog.alterTableStats(TableIdentifier("test1"), None) Review Comment: Does it test anything? It just invokes `alterTableStats` but does no verification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1023592944 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -609,6 +609,20 @@ private[hive] class HiveClientImpl( shim.alterTable(client, qualifiedTableName, hiveTable) } + override def alterTableStats( + dbName: String, + tableName: String, + stats: Map[String, String]): Unit = withHiveState { +val hiveTable = getRawHiveTable(dbName, tableName).rawTable.asInstanceOf[HiveTable] +val newParameters = new JHashMap[String, String]() + hiveTable.getParameters.asScala.toMap.filterNot(_._1.startsWith(STATISTICS_PREFIX)) Review Comment: It's a bit tricky to make `HiveClient` handle this `STATISTICS_PREFIX`. It should be the responsibility of `HiveExternalCatalog`. `HiveClient` should only take care of the communication with HMS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1022525354 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -609,6 +609,17 @@ private[hive] class HiveClientImpl( shim.alterTable(client, qualifiedTableName, hiveTable) } + override def alterTableProps( + dbName: String, + tableName: String, + newProps: Map[String, String]): Unit = withHiveState { +val hiveTable = getRawHiveTable(dbName, tableName).rawTable.asInstanceOf[HiveTable] Review Comment: This method should take `RawHiveTable`, so that we don't need to look up the table here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1021070978 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -722,18 +722,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat stats: Option[CatalogStatistics]): Unit = withClient { requireTableExists(db, table) val rawTable = getRawTable(db, table) Review Comment: we can call `client.getRawHiveTable`, see https://github.com/apache/spark/commit/0942ea9f352f5fdf413dad750a672d15c7257776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1021070839 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -722,18 +722,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat stats: Option[CatalogStatistics]): Unit = withClient { requireTableExists(db, table) val rawTable = getRawTable(db, table) - -// convert table statistics to properties so that we can persist them through hive client +val oldTableNonStatsProps = rawTable.properties.filterNot(_._1.startsWith(STATISTICS_PREFIX)) val statsProperties = if (stats.isDefined) { statsToProperties(stats.get) } else { new mutable.HashMap[String, String]() } -val oldTableNonStatsProps = rawTable.properties.filterNot(_._1.startsWith(STATISTICS_PREFIX)) -val updatedTable = rawTable.copy(properties = oldTableNonStatsProps ++ statsProperties) -client.alterTable(updatedTable) +client.alterTableStats(db, table, parameters = oldTableNonStatsProps ++ statsProperties) Review Comment: We can pass `rawTable` to the hive client to avoid the extra RPC call. See -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1021070153 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala: ## @@ -894,12 +895,14 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter sql(insertString.toLowerCase(Locale.ROOT)) sql(insertString.toUpperCase(Locale.ROOT)) + spark.sessionState.catalog.alterTableStats(TableIdentifier("test1"), None) Review Comment: what are we doing here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1021069901 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -609,6 +609,19 @@ private[hive] class HiveClientImpl( shim.alterTable(client, qualifiedTableName, hiveTable) } + override def alterTableStats( + dbName: String, + tableName: String, + parameters: Map[String, String]): Unit = withHiveState { +val hiveTable = + getRawTableOption(dbName, tableName).getOrElse( Review Comment: we can just call `getRawHiveTable` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1021069335 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala: ## @@ -127,6 +127,9 @@ private[hive] trait HiveClient { */ def alterTable(dbName: String, tableName: String, table: CatalogTable): Unit + /** Alter a table stats */ + def alterTableStats(dbName: String, tableName: String, parameters: Map[String, String]): Unit Review Comment: ```suggestion def alterTableProps(dbName: String, tableName: String, newProps: Map[String, String]): Unit ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1021069335 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala: ## @@ -127,6 +127,9 @@ private[hive] trait HiveClient { */ def alterTable(dbName: String, tableName: String, table: CatalogTable): Unit + /** Alter a table stats */ + def alterTableStats(dbName: String, tableName: String, parameters: Map[String, String]): Unit Review Comment: ```suggestion def alterTableStats(dbName: String, tableName: String, newStats: Map[String, String]): Unit ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org