[ https://issues.apache.org/jira/browse/SPARK-23445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-23445. ----------------------------- Resolution: Fixed Assignee: Juliusz Sompolski Fix Version/s: 2.4.0 > ColumnStat refactoring > ---------------------- > > Key: SPARK-23445 > URL: https://issues.apache.org/jira/browse/SPARK-23445 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Juliusz Sompolski > Assignee: Juliusz Sompolski > Priority: Major > Fix For: 2.4.0 > > > Refactor ColumnStat to be more flexible. > * Split {{ColumnStat}} and {{CatalogColumnStat}} just like > {{CatalogStatistics}} is split from {{Statistics}}. This detaches how the > statistics are stored from how they are processed in the query plan. > {{CatalogColumnStat}} keeps {{min}} and {{max}} as {{String}}, making it not > depend on dataType information. > * For {{CatalogColumnStat}}, parse column names from property names in the > metastore ({{KEY_VERSION }}property), not from metastore schema. This allows > the catalog to read stats into {{CatalogColumnStat}}s even if the schema > itself is not in the metastore. > * Make all fields optional. {{min}}, {{max}} and {{histogram}} for columns > were optional already. Having them all optional is more consistent, and gives > flexibility to e.g. drop some of the fields through transformations if they > are difficult / impossible to calculate. > The added flexibility will make it possible to have alternative > implementations for stats, and separates stats collection from stats and > estimation processing in plans. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org