[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879406#action_12879406 ]
Soundararajan Velu commented on HIVE-1139: ------------------------------------------ Thanks Ning, sounds logical, will try with 0.15 and tune accordingly in our environment, but on a long run I guess we may need a strong reflection based serde Map. I am still exploring if it can be achieved.. will keep the progress posted. > GroupByOperator sometimes throws OutOfMemory error when there are too many > distinct keys > ---------------------------------------------------------------------------------------- > > Key: HIVE-1139 > URL: https://issues.apache.org/jira/browse/HIVE-1139 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.5.0 > Reporter: Ning Zhang > Assignee: Arvind Prabhakar > Attachments: PersistentMap.zip > > > When a partial aggregation performed on a mapper, a HashMap is created to > keep all distinct keys in main memory. This could leads to OOM exception when > there are too many distinct keys for a particular mapper. A workaround is to > set the map split size smaller so that each mapper takes less number of rows. > A better solution is to use the persistent HashMapWrapper (currently used in > CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.