ann created HIVE-25182:
--------------------------

             Summary: Hive map.aggr and groupby.skewdata both enable , maybe 
cause incorrect results.
                 Key: HIVE-25182
                 URL: https://issues.apache.org/jira/browse/HIVE-25182
             Project: Hive
          Issue Type: Bug
    Affects Versions: All Versions
            Reporter: ann
         Attachments: flush case.png, random case.png, round strategy.png

 When map.aggr and groupby.skewindata both are enable ,  groupByOperator exists 
indeterminacy . for example , hive records would aggregate by hash table, 
groupby.skewindata setted random() in partition key . When shuffle failed and 
some map task rerun, the indeterminacy of program cause the incorrect result . 
The following are two legands that show how random() and aggregate by hash 
table cause the incorrect results.

I have a implement of new stategy in groupByOperator . The strategy is based on 
map.aggr and groupby.skewindata , and circularly shuffle records to every 
partition.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to