[ 
https://issues.apache.org/jira/browse/ORC-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996657#comment-14996657
 ] 

ASF GitHub Bot commented on ORC-21:
-----------------------------------

Github user asandryh commented on a diff in the pull request:

    https://github.com/apache/orc/pull/12#discussion_r44282877
  
    --- Diff: c++/src/Reader.cc ---
    @@ -1433,6 +1538,7 @@ namespace orc {
       }
     
       void ReaderImpl::startNextStripe() {
    +    reader.reset(); // ColumnReaders use lots of memory; free old memory 
first
    --- End diff --
    
    Why? On line 1550, we assign reader a new value. It destroys the previous 
value anyway. What calling reset() explicitly does for us is it frees up all 
the dynamic memory used in the existing instance of ColumnReader BEFORE 
creating a new ColumnReader and allocating all the buffers. Without calling 
reset(), the deallocation happens AFTER a new reader is created, so for a brief 
moment we use 2x memory than we really need. It messes up our memory estimates, 
too.


> Add functionality to estimate memory footprint
> ----------------------------------------------
>
>                 Key: ORC-21
>                 URL: https://issues.apache.org/jira/browse/ORC-21
>             Project: Orc
>          Issue Type: Task
>            Reporter: Aliaksei Sandryhaila
>            Assignee: Aliaksei Sandryhaila
>
> ORC library allocates multiple large buffers to read and materialize ORC 
> files. For stability of applications that use the library, it may be 
> desirable to have an estimate (preferably, a tight upper bound) of a memory 
> footprint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to