[ https://issues.apache.org/jira/browse/ARROW-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519819#comment-17519819 ]
Micah Kornfield commented on ARROW-16160: ----------------------------------------- It appears in on master branch we get: "pyarrow.lib.ArrowInvalid: Tried reading schema message, was null or length 0" It seems like the error message could be improved here. > [C++] IPC Stream Reader doesn't check if extra fields are present for > RecordBatches > ----------------------------------------------------------------------------------- > > Key: ARROW-16160 > URL: https://issues.apache.org/jira/browse/ARROW-16160 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Affects Versions: 6.0.1 > Reporter: Micah Kornfield > Priority: Major > > I looked through recent commits and I don't think this issue has been patched > since: > {code:title=test.python|borderStyle=solid} > import pyarrow as pa > with pa.output_stream("/tmp/f1") as sink: > with pa.RecordBatchStreamWriter(sink, rb1.schema) as writer: > writer.write(rb1) > end_rb1 = sink.tell() > with pa.output_stream("/tmp/f2") as sink: > with pa.RecordBatchStreamWriter(sink, rb2.schema) as writer: > writer.write(rb2) > start_rb2_only = sink.tell() > writer.write(rb2) > end_rb2 = sink.tell() > # Stitch to togher rb1.schema, rb1 and rb2 without schema. > with pa.output_stream("/tmp/f3") as sink: > with pa.input_stream("/tmp/f1") as inp: > sink.write(inp.read(end_rb1)) > with pa.input_stream("/tmp/f2") as inp: > inp.seek(start_rb2_only) > sink.write(inp.read(end_rb2 - start_rb2_only)) > with pa.ipc.open_stream("/tmp/f3") as sink: > print(sink.read_all()) > {code} > Yields: > {code} > {{pyarrow.Table > c1: int64 > ---- > c1: [[1],[1]] > {code} > I would expect this to error because the second stiched in record batch has > more fields then necessary but it appears to load just fine. > Is this intended behavior? -- This message was sent by Atlassian Jira (v8.20.1#820001)