omalley opened a new pull request, #1161:
URL: https://github.com/apache/orc/pull/1161

   ### What changes were proposed in this pull request?
   
   This adds a new configuration for the row by row writer that sets a limit of 
how many items will be buffered in the VectorizedRowBatch before it is sent to 
the ORC Writer. This does not change the behavior of the core writer, where the 
application already has control over the size of the batch.
   
   ### Why are the changes needed?
   
   We are getting OOM when writing rows with long arrays with the row-by-row 
writer. Heap dumps show that it is the ColumnVector inside the array that is 
taking large amounts of memory.
   
   
   ### How was this patch tested?
   
   Updated a couple unit tests and added two more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to