Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18248#discussion_r121058113
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
    @@ -349,29 +349,38 @@ object CatalogTable {
     
     /**
      * This class of statistics is used in [[CatalogTable]] to interact with 
metastore.
    + * We prefer Spark statistics over Hive statistics if they both exist.
      * We define this new class instead of directly using [[Statistics]] here 
because there are no
    - * concepts of attributes or broadcast hint in catalog.
    + * concepts of attributes or hints in catalog.
      */
     case class CatalogStatistics(
    -    sizeInBytes: BigInt,
    -    rowCount: Option[BigInt] = None,
    -    colStats: Map[String, ColumnStat] = Map.empty) {
    +    sparkStats: Option[ExternalStatistics] = None,
    +    hiveStats: Option[ExternalStatistics] = None) {
    --- End diff --
    
    If we are designing the interface like this, we might need to refactor it 
again in the near future. Stats could be collected from Spark, imported from 
Hive, set by external users, or even from the data source API v2 (in the 
future). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to