[ 
https://issues.apache.org/jira/browse/ARROW-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377480#comment-16377480
 ] 

ASF GitHub Bot commented on ARROW-1996:
---------------------------------------

pitrou commented on a change in pull request #1667: ARROW-1996: [Python] Fix 
repeated deserialization from a stream
URL: https://github.com/apache/arrow/pull/1667#discussion_r170713174
 
 

 ##########
 File path: cpp/src/arrow/python/arrow_to_python.cc
 ##########
 @@ -259,7 +260,10 @@ Status ReadSerializedObject(io::RandomAccessFile* src, 
SerializedPyObject* out)
   RETURN_NOT_OK(reader->ReadNext(&out->batch));
 
   RETURN_NOT_OK(src->Tell(&offset));
+
   offset += 4;  // Skip the end-of-stream message
+  offset = ipc::PaddedLength(offset, ipc::kArrowIpcAlignment);
 
 Review comment:
   Do we want to abstract this away in a helper?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] pyarrow.read_serialized cannot read concatenated records
> -----------------------------------------------------------------
>
>                 Key: ARROW-1996
>                 URL: https://issues.apache.org/jira/browse/ARROW-1996
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: Linux
>            Reporter: Richard Shin
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> The following code
> {quote}import pyarrow as pa
> f = pa.OSFile('arrow_test', 'w')
>  pa.serialize_to(12, f)
>  pa.serialize_to(23, f)
>  f.close()
> f = pa.OSFile('arrow_test', 'r')
>  print(pa.read_serialized(f).deserialize())
>  print(pa.read_serialized(f).deserialize())
>  f.close()
> {quote}
> gives the following result:
> {quote}$ python pyarrow_test.py
>  First: 12
>  Traceback (most recent call last):
>  File "pyarrow_test.py", line 10, in <module>
>  print('Second: {}'.format(pa.read_serialized(f).deserialize()))
>  File "pyarrow/serialization.pxi", line 347, in pyarrow.lib.read_serialized 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:79159)
>  File "pyarrow/error.pxi", line 77, in pyarrow.lib.check_status 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:8270)
>  pyarrow.lib.ArrowInvalid: Expected schema message in stream, was null or 
> length 0
> {quote}
> I would have expected read_serialized to sucessfully read the second value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to