[ https://issues.apache.org/jira/browse/ARROW-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-3098: ---------------------------------- Labels: pull-request-available (was: ) > [Python] BufferReader doesn't adhere to the seek protocol > --------------------------------------------------------- > > Key: ARROW-3098 > URL: https://issues.apache.org/jira/browse/ARROW-3098 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.10.0 > Reporter: Björn Andersson > Assignee: Antoine Pitrou > Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > I have a script that creates a Parquet file and then writes it out to a > {{BufferOutputStream}} and then into a {{BufferReader}} with the intention of > passing it to a place that takes a file-like object to upload it somewhere > else. But the other location relies on being able to seek to the end of the > file to figure out how big the file is, e.g. > {code:python} > reader.seek(0, 2) > size = reader.tell() > reader.seek(0) > {code} > > But when I do that the following exception is raised: > > {code} > pyarrow/io.pxi:209: in pyarrow.lib.NativeFile.seek > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowIOError: position out of bounds > {code} > I compared it to casting to an {{io.BytesIO}} instead which works: > {code:python} > import io > import pyarrow as pa > def test_arrow_output_stream(): > output = pa.BufferOutputStream() > output.write(b'hello') > reader = pa.BufferReader(output.getvalue()) > reader.seek(0, 2) > assert reader.tell() == 5 > def test_python_io_stream(): > output = pa.BufferOutputStream() > output.write(b'hello') > buffer = io.BytesIO(output.getvalue().to_pybytes()) > reader = io.BufferedRandom(buffer) > reader.seek(0, 2) > assert reader.tell() == 5 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)