[
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647306#comment-13647306
]
Phabricator commented on HIVE-4421:
-----------------------------------
omalley has commented on the revision "HIVE-4421 [jira] Improve memory usage by
ORC dictionaries".
Ashutosh, I incorporated most of your input. The 5000 rows between memory
checks is just how often we check the writers against the size of their
allocation. If there is enough memory, it doesn't result in any IO. I don't
think there would be enough use to justify making it into a HiveConf variable.
You asked why I removed the countOutput and the answer is that we didn't have
immediate plans to use it, the use case for it was relatively rare and it saved
some memory & complexity.
REVISION DETAIL
https://reviews.facebook.net/D10545
To: JIRA, ashutoshc, omalley
> Improve memory usage by ORC dictionaries
> ----------------------------------------
>
> Key: HIVE-4421
> URL: https://issues.apache.org/jira/browse/HIVE-4421
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.11.0
>
> Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch,
> HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch
>
>
> Currently, for tables with many string columns, it is possible to
> significantly underestimate the memory used by the ORC dictionaries and cause
> the query to run out of memory in the task.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira