[ https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531703#comment-16531703 ]
Ashutosh Chauhan commented on HIVE-19326: ----------------------------------------- +1 > stats auto gather: incorrect aggregation during UNION queries (may lead to > incorrect results) > --------------------------------------------------------------------------------------------- > > Key: HIVE-19326 > URL: https://issues.apache.org/jira/browse/HIVE-19326 > Project: Hive > Issue Type: Bug > Components: Statistics > Reporter: Sergey Shelukhin > Assignee: Zoltan Haindrich > Priority: Critical > Attachments: HIVE-19326.01wip01.patch, HIVE-19326.02.patch, > HIVE-19326.03.patch, HIVE-19326.04.patch, HIVE-19326.05.patch, > HIVE-19326.06.patch, HIVE-19326.06wip01.patch, HIVE-19326.06wip02.patch, > HIVE-19326.06wip03.patch, HIVE-19326.06wip04.patch, HIVE-19326.06wip05.patch, > HIVE-19326.07.patch, HIVE-19326.08.patch, HIVE-19326.09.patch, > HIVE-19326.10.patch, HIVE-19326.11.patch, HIVE-19326.11.patch, > HIVE-19326.12.patch, HIVE-19326.13.patch, HIVE-19326.13.patch > > > Found when investigating the results change after converting tables to MM, > turns out the MM result is correct but the current one is not. > The test ends like so: > {noformat} > desc formatted small_alltypesorc_a; > ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS; > desc formatted small_alltypesorc_a; > insert into table small_alltypesorc_a select * from small_alltypesorc1a; > desc formatted small_alltypesorc_a; > {noformat} > The results from the descs in the golden file are: > {noformat} > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > numFiles 1 > numRows 5 > ... > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > numFiles 1 > numRows 15 > ... > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > numFiles 2 > numRows 20 > {noformat} > Note the result change after analyze - the original nomRows is inaccurate, > but BASIC_STATS is set to true. > I am assuming with metadata only optimization this can produce incorrect > results. -- This message was sent by Atlassian JIRA (v7.6.3#76005)