[ https://issues.apache.org/jira/browse/SPARK-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-4760: ------------------------------------ Priority: Critical (was: Major) > "ANALYZE TABLE table COMPUTE STATISTICS noscan" failed estimating table size > for tables created from Parquet files > ------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-4760 > URL: https://issues.apache.org/jira/browse/SPARK-4760 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Reporter: Jianshi Huang > Priority: Critical > > In a older Spark version built around Oct. 12, I was able to use > ANALYZE TABLE table COMPUTE STATISTICS noscan > to get estimated table size, which is important for optimizing joins. (I'm > joining 15 small dimension tables, and this is crucial to me). > In the more recent Spark builds, it fails to estimate the table size unless I > remove "noscan". > Here's the statistics I got using DESC EXTENDED: > old: > parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1417763591, totalSize=56166} > new: > parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892, > COLUMN_STATS_ACCURATE=false, totalSize=0, numRows=-1, rawDataSize=-1} > And I've tried turning off spark.sql.hive.convertMetastoreParquet in my > spark-defaults.conf and the result is unaffected (in both versions). > Looks like the Parquet support in new Hive (0.13.1) is broken? > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org