[GitHub] [spark] viirya commented on a change in pull request #27055: [SPARK-30394]Skip DetermineTableStats rule when hive table can be converted to datasource table

2020-02-03 Thread GitBox
viirya commented on a change in pull request #27055: [SPARK-30394]Skip 
DetermineTableStats rule when hive table can be converted to datasource table
URL: https://github.com/apache/spark/pull/27055#discussion_r374296512
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
 ##
 @@ -139,13 +139,15 @@ class DetermineTableStats(session: SparkSession) extends 
Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
 case relation: HiveTableRelation
-  if DDLUtils.isHiveTable(relation.tableMeta) && 
relation.tableMeta.stats.isEmpty =>
+  if DDLUtils.isHiveTable(relation.tableMeta) && 
relation.tableMeta.stats.isEmpty &&
+!RelationConversions.isConvertible(relation) =>
 
 Review comment:
   I think we just need this change that skips `DetermineTableStats when Hive 
will be converted later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #27055: [SPARK-30394]Skip DetermineTableStats rule when hive table can be converted to datasource table

2020-02-02 Thread GitBox
viirya commented on a change in pull request #27055: [SPARK-30394]Skip 
DetermineTableStats rule when hive table can be converted to datasource table
URL: https://github.com/apache/spark/pull/27055#discussion_r373953950
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala
 ##
 @@ -33,12 +33,10 @@ import org.apache.spark.sql.types.StructType
  *
  * @param sparkSession a [[SparkSession]]
  * @param table the metadata of the table
- * @param sizeInBytes the table's data size in bytes
  */
 class CatalogFileIndex(
 sparkSession: SparkSession,
-val table: CatalogTable,
-override val sizeInBytes: Long) extends FileIndex {
 
 Review comment:
   This change as @cloud-fan said, is expensive. And it doesn't follow up the 
defined behavior for partitioned data source and Hive table regrading 
statistics calculation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org