[ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4421:
------------------------------

    Attachment: HIVE-4421.D10545.2.patch

omalley updated the revision "HIVE-4421 [jira] Improve memory usage by ORC 
dictionaries".

  Changed the memory manager to check on each 5000 total rows added. This seems 
to give the best trade off between handling too many writers in a small heap 
and still managing memory pretty accurately.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10545

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10545?vs=32889&id=33201#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
  ql/src/test/resources/orc-file-dump.out

To: JIRA, omalley

                
> Improve memory usage by ORC dictionaries
> ----------------------------------------
>
>                 Key: HIVE-4421
>                 URL: https://issues.apache.org/jira/browse/HIVE-4421
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.11.0
>
>         Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch
>
>
> Currently, for tables with many string columns, it is possible to 
> significantly underestimate the memory used by the ORC dictionaries and cause 
> the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to