GroupByOperator sometimes throws OutOfMemory error when there are too many 
distinct keys
----------------------------------------------------------------------------------------

                 Key: HIVE-1139
                 URL: https://issues.apache.org/jira/browse/HIVE-1139
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Ning Zhang
            Assignee: Ning Zhang


When a partial aggregation performed on a mapper, a HashMap is created to keep 
all distinct keys in main memory. This could leads to OOM exception when there 
are too many distinct keys for a particular mapper. A workaround is to set the 
map split size smaller so that each mapper takes less number of rows. A better 
solution is to use the persistent HashMapWrapper (currently used in 
CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to