[ https://issues.apache.org/jira/browse/SPARK-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin resolved SPARK-18505. --------------------------------- Resolution: Fixed Fix Version/s: 2.1.0 > Simplify AnalyzeColumnCommand > ----------------------------- > > Key: SPARK-18505 > URL: https://issues.apache.org/jira/browse/SPARK-18505 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Assignee: Reynold Xin > Fix For: 2.1.0 > > > I'm spending more time at the design & code level for cost-based optimizer > now, and have found a number of issues related to maintainability and > compatibility that I will like to address. > This is a small pull request to clean up AnalyzeColumnCommand: > 1. Removed warning on duplicated columns. Warnings in log messages are > useless since most users that run SQL don't see them. > 2. Removed the nested updateStats function, by just inlining the function. > 3. Renamed a few functions to better reflect what they do. > 4. Removed the factory apply method for ColumnStatStruct. It is a bad pattern > to use a apply method that returns an instantiation of a class that is not of > the same type (ColumnStatStruct.apply used to return CreateNamedStruct). > 5. Renamed ColumnStatStruct to just AnalyzeColumnCommand. > 6. Added more documentation explaining some of the non-obvious return types > and code blocks. > In follow-up pull requests, I'd like to address the following: > 1. Get rid of the Map[String, ColumnStat] map, since internally we should be > using Attribute to reference columns, rather than strings. > 2. Decouple the fields exposed by ColumnStat and internals of Spark SQL's > execution path. Currently the two are coupled because ColumnStat takes in an > InternalRow. > 3. Correctness: Remove code path that stores statistics in the catalog using > the base64 encoding of the UnsafeRow format, which is not stable across Spark > versions. > 4. Clearly document the data representation stored in the catalog for > statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org