[ 
https://issues.apache.org/jira/browse/SPARK-23445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-23445.
-----------------------------
       Resolution: Fixed
         Assignee: Juliusz Sompolski
    Fix Version/s: 2.4.0

> ColumnStat refactoring
> ----------------------
>
>                 Key: SPARK-23445
>                 URL: https://issues.apache.org/jira/browse/SPARK-23445
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Juliusz Sompolski
>            Assignee: Juliusz Sompolski
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Refactor ColumnStat to be more flexible.
>  * Split {{ColumnStat}} and {{CatalogColumnStat}} just like 
> {{CatalogStatistics}} is split from {{Statistics}}. This detaches how the 
> statistics are stored from how they are processed in the query plan. 
> {{CatalogColumnStat}} keeps {{min}} and {{max}} as {{String}}, making it not 
> depend on dataType information.
>  * For {{CatalogColumnStat}}, parse column names from property names in the 
> metastore ({{KEY_VERSION }}property), not from metastore schema. This allows 
> the catalog to read stats into {{CatalogColumnStat}}s even if the schema 
> itself is not in the metastore.
>  * Make all fields optional. {{min}}, {{max}} and {{histogram}} for columns 
> were optional already. Having them all optional is more consistent, and gives 
> flexibility to e.g. drop some of the fields through transformations if they 
> are difficult / impossible to calculate.
> The added flexibility will make it possible to have alternative 
> implementations for stats, and separates stats collection from stats and 
> estimation processing in plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to