GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20624
[SPARK-23445] ColumnStat refactoring ## What changes were proposed in this pull request? Refactor ColumnStat to be more flexible. * Split `ColumnStat` and `CatalogColumnStat` just like `CatalogStatistics` is split from `Statistics`. This detaches how the statistics are stored from how they are processed in the query plan. `CatalogColumnStat` keeps `min` and `max` as `String`, making it not depend on dataType information. * For `CatalogColumnStat`, parse column names from property names in the metastore (`KEY_VERSION` property), not from metastore schema. This means that `CatalogColumnStat`s can be created for columns even if the schema itself is not stored in the metastore. * Make all fields optional. `min`, `max` and `histogram` for columns were optional already. Having them all optional is more consistent, and gives flexibility to e.g. drop some of the fields through transformations if they are difficult / impossible to calculate. The added flexibility will make it possible to have alternative implementations for stats, and separates stats collection from stats and estimation processing in plans. ## How was this patch tested? Refactored existing tests to work with refactored `ColumnStat` and `CatalogColumnStat`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/juliuszsompolski/apache-spark SPARK-23445 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20624.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20624 ---- commit cf3602075dcee35494c72975e361b739939079b4 Author: Juliusz Sompolski <julek@...> Date: 2018-01-19T13:57:46Z column stat refactoring ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org