[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1028966247


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##
@@ -105,6 +106,15 @@ private[hive] class HiveClientImpl(
 
   private class RawHiveTableImpl(override val rawTable: HiveTable) extends 
RawHiveTable {
 override lazy val toCatalogTable = convertHiveTableToCatalogTable(rawTable)
+
+override def hiveTableProps(containsStats: Boolean): Map[String, String] = 
{

Review Comment:
   why do we need this parameter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1028755248


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala:
##
@@ -113,6 +113,9 @@ private[hive] trait HiveClient {
   /** Creates a table with the given metadata. */
   def createTable(table: CatalogTable, ignoreIfExists: Boolean): Unit
 
+  /** Get hive table properties. */
+  def hiveTableProps(rawHiveTable: RawHiveTable, containsStats: Boolean): 
Map[String, String]

Review Comment:
   shall we add a method in `RawHiveTable` to do it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1027963889


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala:
##
@@ -721,19 +721,16 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
   table: String,
   stats: Option[CatalogStatistics]): Unit = withClient {
 requireTableExists(db, table)
-val rawTable = getRawTable(db, table)
-
-// convert table statistics to properties so that we can persist them 
through hive client
-val statsProperties =
+val rawHiveTable = client.getRawHiveTable(db, table)
+val oldProps = client.hiveTableProps(rawHiveTable)

Review Comment:
   can you explain the rationale?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-20 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1027656667


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala:
##
@@ -894,12 +895,14 @@ class InsertSuite extends QueryTest with 
TestHiveSingleton with BeforeAndAfter
 
   sql(insertString.toLowerCase(Locale.ROOT))
   sql(insertString.toUpperCase(Locale.ROOT))
+  spark.sessionState.catalog.alterTableStats(TableIdentifier("test1"), 
None)

Review Comment:
   Does it test anything? It just invokes `alterTableStats` but does no 
verification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-15 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1023592944


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##
@@ -609,6 +609,20 @@ private[hive] class HiveClientImpl(
 shim.alterTable(client, qualifiedTableName, hiveTable)
   }
 
+  override def alterTableStats(
+  dbName: String,
+  tableName: String,
+  stats: Map[String, String]): Unit = withHiveState {
+val hiveTable = getRawHiveTable(dbName, 
tableName).rawTable.asInstanceOf[HiveTable]
+val newParameters = new JHashMap[String, String]()
+
hiveTable.getParameters.asScala.toMap.filterNot(_._1.startsWith(STATISTICS_PREFIX))

Review Comment:
   It's a bit tricky to make `HiveClient` handle this `STATISTICS_PREFIX`. It 
should be the responsibility of `HiveExternalCatalog`. `HiveClient` should only 
take care of the communication with HMS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-15 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1022525354


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##
@@ -609,6 +609,17 @@ private[hive] class HiveClientImpl(
 shim.alterTable(client, qualifiedTableName, hiveTable)
   }
 
+  override def alterTableProps(
+  dbName: String,
+  tableName: String,
+  newProps: Map[String, String]): Unit = withHiveState {
+val hiveTable = getRawHiveTable(dbName, 
tableName).rawTable.asInstanceOf[HiveTable]

Review Comment:
   This method should take `RawHiveTable`, so that we don't need to look up the 
table here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-13 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1021070978


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala:
##
@@ -722,18 +722,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
   stats: Option[CatalogStatistics]): Unit = withClient {
 requireTableExists(db, table)
 val rawTable = getRawTable(db, table)

Review Comment:
   we can call `client.getRawHiveTable`, see 
https://github.com/apache/spark/commit/0942ea9f352f5fdf413dad750a672d15c7257776



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-13 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1021070839


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala:
##
@@ -722,18 +722,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
   stats: Option[CatalogStatistics]): Unit = withClient {
 requireTableExists(db, table)
 val rawTable = getRawTable(db, table)
-
-// convert table statistics to properties so that we can persist them 
through hive client
+val oldTableNonStatsProps = 
rawTable.properties.filterNot(_._1.startsWith(STATISTICS_PREFIX))
 val statsProperties =
   if (stats.isDefined) {
 statsToProperties(stats.get)
   } else {
 new mutable.HashMap[String, String]()
   }
 
-val oldTableNonStatsProps = 
rawTable.properties.filterNot(_._1.startsWith(STATISTICS_PREFIX))
-val updatedTable = rawTable.copy(properties = oldTableNonStatsProps ++ 
statsProperties)
-client.alterTable(updatedTable)
+client.alterTableStats(db, table, parameters = oldTableNonStatsProps ++ 
statsProperties)

Review Comment:
   We can pass `rawTable` to the hive client to avoid the extra RPC call. See 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-13 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1021070153


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala:
##
@@ -894,12 +895,14 @@ class InsertSuite extends QueryTest with 
TestHiveSingleton with BeforeAndAfter
 
   sql(insertString.toLowerCase(Locale.ROOT))
   sql(insertString.toUpperCase(Locale.ROOT))
+  spark.sessionState.catalog.alterTableStats(TableIdentifier("test1"), 
None)

Review Comment:
   what are we doing here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-13 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1021069901


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##
@@ -609,6 +609,19 @@ private[hive] class HiveClientImpl(
 shim.alterTable(client, qualifiedTableName, hiveTable)
   }
 
+  override def alterTableStats(
+  dbName: String,
+  tableName: String,
+  parameters: Map[String, String]): Unit = withHiveState {
+val hiveTable =
+  getRawTableOption(dbName, tableName).getOrElse(

Review Comment:
   we can just call `getRawHiveTable`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-13 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1021069335


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala:
##
@@ -127,6 +127,9 @@ private[hive] trait HiveClient {
*/
   def alterTable(dbName: String, tableName: String, table: CatalogTable): Unit
 
+  /** Alter a table stats */
+  def alterTableStats(dbName: String, tableName: String, parameters: 
Map[String, String]): Unit

Review Comment:
   ```suggestion
 def alterTableProps(dbName: String, tableName: String, newProps: 
Map[String, String]): Unit
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-13 Thread GitBox


cloud-fan commented on code in PR #38495:
URL: https://github.com/apache/spark/pull/38495#discussion_r1021069335


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala:
##
@@ -127,6 +127,9 @@ private[hive] trait HiveClient {
*/
   def alterTable(dbName: String, tableName: String, table: CatalogTable): Unit
 
+  /** Alter a table stats */
+  def alterTableStats(dbName: String, tableName: String, parameters: 
Map[String, String]): Unit

Review Comment:
   ```suggestion
 def alterTableStats(dbName: String, tableName: String, newStats: 
Map[String, String]): Unit
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org