[
https://issues.apache.org/jira/browse/ORC-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996657#comment-14996657
]
ASF GitHub Bot commented on ORC-21:
-----------------------------------
Github user asandryh commented on a diff in the pull request:
https://github.com/apache/orc/pull/12#discussion_r44282877
--- Diff: c++/src/Reader.cc ---
@@ -1433,6 +1538,7 @@ namespace orc {
}
void ReaderImpl::startNextStripe() {
+ reader.reset(); // ColumnReaders use lots of memory; free old memory
first
--- End diff --
Why? On line 1550, we assign reader a new value. It destroys the previous
value anyway. What calling reset() explicitly does for us is it frees up all
the dynamic memory used in the existing instance of ColumnReader BEFORE
creating a new ColumnReader and allocating all the buffers. Without calling
reset(), the deallocation happens AFTER a new reader is created, so for a brief
moment we use 2x memory than we really need. It messes up our memory estimates,
too.
> Add functionality to estimate memory footprint
> ----------------------------------------------
>
> Key: ORC-21
> URL: https://issues.apache.org/jira/browse/ORC-21
> Project: Orc
> Issue Type: Task
> Reporter: Aliaksei Sandryhaila
> Assignee: Aliaksei Sandryhaila
>
> ORC library allocates multiple large buffers to read and materialize ORC
> files. For stability of applications that use the library, it may be
> desirable to have an estimate (preferably, a tight upper bound) of a memory
> footprint.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)