Sascha Hofmann created ARROW-6046: ------------------------------------- Summary: Slice RecordBatch of String array with offset 0 Key: ARROW-6046 URL: https://issues.apache.org/jira/browse/ARROW-6046 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.14.1 Reporter: Sascha Hofmann
We are seeing a very similar bug as in ARROW-809, just for a RecordBatch of strings. A slice of a RecordBatch with a string column and offset =0 returns the whole batch instead. {code:java} import pandas as pd import pyarrow as pa df = pd.DataFrame({ 'b': ['test' for x in range(1000_000)]}) tbl = pa.Table.from_pandas(df) batch = tbl.to_batches()[0] batch.slice(0,2).serialize().size # 4000232 batch.slice(1,2).serialize().size # 240 {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)