[GitHub] [spark] viirya commented on a change in pull request #27055: [SPARK-30394]Skip DetermineTableStats rule when hive table can be converted to datasource table
viirya commented on a change in pull request #27055: [SPARK-30394]Skip DetermineTableStats rule when hive table can be converted to datasource table URL: https://github.com/apache/spark/pull/27055#discussion_r374296512 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ## @@ -139,13 +139,15 @@ class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case relation: HiveTableRelation - if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty => + if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty && +!RelationConversions.isConvertible(relation) => Review comment: I think we just need this change that skips `DetermineTableStats when Hive will be converted later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27055: [SPARK-30394]Skip DetermineTableStats rule when hive table can be converted to datasource table
viirya commented on a change in pull request #27055: [SPARK-30394]Skip DetermineTableStats rule when hive table can be converted to datasource table URL: https://github.com/apache/spark/pull/27055#discussion_r373953950 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala ## @@ -33,12 +33,10 @@ import org.apache.spark.sql.types.StructType * * @param sparkSession a [[SparkSession]] * @param table the metadata of the table - * @param sizeInBytes the table's data size in bytes */ class CatalogFileIndex( sparkSession: SparkSession, -val table: CatalogTable, -override val sizeInBytes: Long) extends FileIndex { Review comment: This change as @cloud-fan said, is expensive. And it doesn't follow up the defined behavior for partitioned data source and Hive table regrading statistics calculation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org