[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968295#comment-15968295 ]
Itai Incze commented on ARROW-809: ---------------------------------- I've fiddled with it a bit - without altering the array class, I found there's a problem finding the exact number of items with a boolean array - where it doesnt matter, and in union array. There may be other instances as well that i'm not aware of. Seems to me that adding a private boolean {{IsSliced}} to the array is the cleanest way. > C++: Writing sliced record batch to IPC writes the entire array > --------------------------------------------------------------- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Itai Incze > Assignee: Wes McKinney > Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 1000000) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)