[ https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509761#comment-16509761 ]
Zoltan Haindrich commented on HIVE-19326: ----------------------------------------- this is odd...it seems on my system the union_fast_stats has a different result than on the ptest server...I've rerun it a few times ....but it doesn't seem to be changing randomly.... > stats auto gather: incorrect aggregation during UNION queries (may lead to > incorrect results) > --------------------------------------------------------------------------------------------- > > Key: HIVE-19326 > URL: https://issues.apache.org/jira/browse/HIVE-19326 > Project: Hive > Issue Type: Bug > Components: Statistics > Reporter: Sergey Shelukhin > Assignee: Zoltan Haindrich > Priority: Critical > Attachments: HIVE-19326.01wip01.patch, HIVE-19326.02.patch, > HIVE-19326.03.patch, HIVE-19326.04.patch, HIVE-19326.05.patch, > HIVE-19326.06.patch, HIVE-19326.06wip01.patch, HIVE-19326.06wip02.patch, > HIVE-19326.06wip03.patch, HIVE-19326.06wip04.patch, HIVE-19326.06wip05.patch, > HIVE-19326.07.patch > > > Found when investigating the results change after converting tables to MM, > turns out the MM result is correct but the current one is not. > The test ends like so: > {noformat} > desc formatted small_alltypesorc_a; > ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS; > desc formatted small_alltypesorc_a; > insert into table small_alltypesorc_a select * from small_alltypesorc1a; > desc formatted small_alltypesorc_a; > {noformat} > The results from the descs in the golden file are: > {noformat} > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > numFiles 1 > numRows 5 > ... > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > numFiles 1 > numRows 15 > ... > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > numFiles 2 > numRows 20 > {noformat} > Note the result change after analyze - the original nomRows is inaccurate, > but BASIC_STATS is set to true. > I am assuming with metadata only optimization this can produce incorrect > results. -- This message was sent by Atlassian JIRA (v7.6.3#76005)