[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626193#comment-13626193
 ] 

Phabricator commented on HIVE-4248:
-----------------------------------

omalley has commented on the revision "HIVE-4248 [jira] Implement a memory 
manager for ORC".

  I agree that it can overshoot, but it won't likely be by that much. Of course 
the normal case is that the dynamic partitions are distributed randomly, in 
which case the current version will do fine. Granted, if the data is already 
sorted by the dynamic partition, it will not do well.

  Ok, I'll add a check when we add a new partition. I was just concerned with 
each new partition addition, it will take longer and longer to do all of the 
checks.

REVISION DETAIL
  https://reviews.facebook.net/D9993

To: JIRA, omalley
Cc: kevinwilfong

                
> Implement a memory manager for ORC
> ----------------------------------
>
>                 Key: HIVE-4248
>                 URL: https://issues.apache.org/jira/browse/HIVE-4248
>             Project: Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch
>
>
> With the large default stripe size (256MB) and dynamic partitions, it is 
> quite easy for users to run out of memory when writing ORC files. We probably 
> need a solution that keeps track of the total number of concurrent ORC 
> writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to