[ https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626193#comment-13626193 ]
Phabricator commented on HIVE-4248: ----------------------------------- omalley has commented on the revision "HIVE-4248 [jira] Implement a memory manager for ORC". I agree that it can overshoot, but it won't likely be by that much. Of course the normal case is that the dynamic partitions are distributed randomly, in which case the current version will do fine. Granted, if the data is already sorted by the dynamic partition, it will not do well. Ok, I'll add a check when we add a new partition. I was just concerned with each new partition addition, it will take longer and longer to do all of the checks. REVISION DETAIL https://reviews.facebook.net/D9993 To: JIRA, omalley Cc: kevinwilfong > Implement a memory manager for ORC > ---------------------------------- > > Key: HIVE-4248 > URL: https://issues.apache.org/jira/browse/HIVE-4248 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch > > > With the large default stripe size (256MB) and dynamic partitions, it is > quite easy for users to run out of memory when writing ORC files. We probably > need a solution that keeps track of the total number of concurrent ORC > writers and divides the available heap space between them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira