ann created HIVE-25182: -------------------------- Summary: Hive map.aggr and groupby.skewdata both enable , maybe cause incorrect results. Key: HIVE-25182 URL: https://issues.apache.org/jira/browse/HIVE-25182 Project: Hive Issue Type: Bug Affects Versions: All Versions Reporter: ann Attachments: flush case.png, random case.png, round strategy.png
When map.aggr and groupby.skewindata both are enable , groupByOperator exists indeterminacy . for example , hive records would aggregate by hash table, groupby.skewindata setted random() in partition key . When shuffle failed and some map task rerun, the indeterminacy of program cause the incorrect result . The following are two legands that show how random() and aggregate by hash table cause the incorrect results. I have a implement of new stategy in groupByOperator . The strategy is based on map.aggr and groupby.skewindata , and circularly shuffle records to every partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)