[jira] [Commented] (ARROW-16160) [C++] IPC Stream Reader doesn't check if extra fields are present for RecordBatches

Micah Kornfield (Jira) Fri, 08 Apr 2022 15:38:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519819#comment-17519819
 ]


Micah Kornfield commented on ARROW-16160:
-----------------------------------------

It appears in on master branch we get:
"pyarrow.lib.ArrowInvalid: Tried reading schema message, was null or length 0"

It seems like the error message could be improved here.

> [C++] IPC Stream Reader doesn't check if extra fields are present for 
> RecordBatches
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-16160
>                 URL: https://issues.apache.org/jira/browse/ARROW-16160
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 6.0.1
>            Reporter: Micah Kornfield
>            Priority: Major
>
> I looked through recent commits and I don't think this issue has been patched 
> since:
> {code:title=test.python|borderStyle=solid}
> import pyarrow as pa
> with pa.output_stream("/tmp/f1") as sink:
>   with pa.RecordBatchStreamWriter(sink, rb1.schema) as writer:
>     writer.write(rb1)
>     end_rb1 = sink.tell()
> with pa.output_stream("/tmp/f2") as sink:
>   with pa.RecordBatchStreamWriter(sink, rb2.schema) as writer:
>     writer.write(rb2)
>     start_rb2_only = sink.tell()
>     writer.write(rb2)
>     end_rb2 = sink.tell()
> # Stitch to togher rb1.schema, rb1 and rb2 without schema.
> with pa.output_stream("/tmp/f3") as sink:
>   with pa.input_stream("/tmp/f1") as inp:
>      sink.write(inp.read(end_rb1))
>   with pa.input_stream("/tmp/f2") as inp:
>     inp.seek(start_rb2_only)
>     sink.write(inp.read(end_rb2 - start_rb2_only))
> with pa.ipc.open_stream("/tmp/f3") as sink:
>   print(sink.read_all())
> {code}
> Yields:
> {code}
> {{pyarrow.Table
> c1: int64
> ----
> c1: [[1],[1]]
> {code}
> I would expect this to error because the second stiched in record batch has 
> more fields then necessary but it appears to load just fine.  
> Is this intended behavior?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-16160) [C++] IPC Stream Reader doesn't check if extra fields are present for RecordBatches

Reply via email to