[ 
https://issues.apache.org/jira/browse/ARROW-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525893#comment-17525893
 ] 

Micah Kornfield commented on ARROW-16118:
-----------------------------------------

Also, we should be careful how this enabled, since if someone is actually 
consuming the stream in real-time there would need to be some sort of 
coordination to ensure bytes aren't sent prematurely.

> [C++] Reduce memory usage when writing to IPC
> ---------------------------------------------
>
>                 Key: ARROW-16118
>                 URL: https://issues.apache.org/jira/browse/ARROW-16118
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Jorge Leitão
>            Priority: Major
>
> Writing a record batch to IPC ([header][buffers]) currently requires O(N*B) 
> where N is the average size of the buffer and B the number of buffers in the 
> recordbatch.
> This is because we need the buffer location and total number of bytes to 
> write the header of the record, which is only known after e.g. knowning by 
> how much the buffers were compressed.
> When the writer supports seeking, this memory usage can be reduced to O(N) 
> where N is the average size of a primitive buffer over all fields. This is 
> done using the following pseudo-code implementation:
> {code:java}
> start = writer.seek(current);
> empty_locations = create_empty_header(schema)
> write_header(writer, empty_locations)
> locations = write_buffers(writer, batch)
> writer.seek(start)
> write_header(writer, locations)
> {code}
> This has a significantly lower memory footprint. O(N) vs O(N*B)
> It could be interesting for the C++ implementation to support this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to