[ https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Na Yang updated HIVE-8756: -------------------------- Attachment: HIVE-8756.1-spark.patch > numRows and rawDataSize are not collected by the Spark stats [Spark Branch] > --------------------------------------------------------------------------- > > Key: HIVE-8756 > URL: https://issues.apache.org/jira/browse/HIVE-8756 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Na Yang > Assignee: Na Yang > Attachments: HIVE-8756.1-spark.patch > > > Run the following hive queries > {noformat} > set datanucleus.cache.collections=false; > set hive.stats.autogather=true; > set hive.merge.mapfiles=false; > set hive.merge.mapredfiles=false; > set hive.map.aggr=true; > create table tmptable(key string, value string); > INSERT OVERWRITE TABLE tmptable > SELECT unionsrc.key, unionsrc.value > FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1 > UNION ALL > SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc; > DESCRIBE FORMATTED tmptable; > {noformat} > The hive on spark prints the following table parameters: > {noformat} > COLUMN_STATS_ACCURATE true > numFiles 2 > numRows 0 > rawDataSize 0 > totalSize 225 > {noformat} > The hive on mr prints the following table parameters: > {noformat} > able Parameters: > COLUMN_STATS_ACCURATE true > numFiles 2 > numRows 26 > rawDataSize 199 > totalSize 225 > {noformat} > As above we can see the numRows and rawDataSize are not collected by hive on > spark stats -- This message was sent by Atlassian JIRA (v6.3.4#6332)