[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968310#comment-15968310 ]
Wes McKinney commented on ARROW-809: ------------------------------------ There is some buffer slicing happening on the IPC write path already: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L207. It needs to be made consistent (+ well tested), though > C++: Writing sliced record batch to IPC writes the entire array > --------------------------------------------------------------- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Itai Incze > Assignee: Wes McKinney > Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 1000000) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)