[ https://issues.apache.org/jira/browse/ARROW-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Ashby updated ARROW-14263: -------------------------------- Description: I'm running into various exceptions (often: "Invalid flatbuffers message") when attempting to de-serialize RecordBatch's in Python that were generated in C++. The same batch can be de-serialized back within C++. *Example (C++)* (status checks omitted, but they are check in real code)*:* {code:java} const auto stream = arrow::io::BufferOutputStream::Create(); { const auto writer = arrow::ipc::MakeStreamWriter(*stream, schema); sdk::MaybeThrowError(writer); const auto writeRes = (*writer)->WriteRecordBatch(batch); sdk::MaybeThrowError((*writer)->Close()); } auto buffer = (*stream)->Finish();std::ofstream ofs("record-batch-large.arrow"); // we'll read this in Python ofs.write(reinterpret_cast<const char*>((*buffer)->data()), (*buffer)->size()); ofs.close();auto backAgain = DeserializeRecordBatch((*buffer)); // all good {code} *Then in Python*: {code:java} with open("record-batch-large.arrow", "rb") as f: data = f.read()reader = pa.RecordBatchStreamReader(data) // throws here - "Invalid flatbuffers message" {code} Please see the attached .arrow file (produced above). Any ideas? was: I'm running into various exceptions (often: "Invalid flatbuffers message") when attempting to de-serialize RecordBatch's in Python that were generated in C++. The same batch can be de-serialized back within C++. *Example (C++)* (status checks omitted, but they are check in real code)*:* {code:java} const auto stream = arrow::io::BufferOutputStream::Create(); { const auto writer = arrow::ipc::MakeStreamWriter(*stream, schema); sdk::MaybeThrowError(writer); const auto writeRes = (*writer)->WriteRecordBatch(batch); sdk::MaybeThrowError((*writer)->Close()); sdk::MaybeThrowError(writeRes); } auto buffer = (*stream)->Finish();std::ofstream ofs("record-batch-large.arrow"); // we'll read this in Python ofs.write(reinterpret_cast<const char*>((*buffer)->data()), (*buffer)->size()); ofs.close();auto backAgain = DeserializeRecordBatch((*buffer)); // all good {code} *Then in Python*: {code:java} with open("record-batch-large.arrow", "rb") as f: data = f.read()reader = pa.RecordBatchStreamReader(data) // throws here - "Invalid flatbuffers message" {code} Please see the attached .arrow file (produced above). Any ideas? > "Invalid flatbuffers message" thrown with some serialized RecordBatch's > ----------------------------------------------------------------------- > > Key: ARROW-14263 > URL: https://issues.apache.org/jira/browse/ARROW-14263 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 5.0.0 > Environment: pyarrow==5.0.0 > C++ = 5.0.0 > Windows 10 Pro x64 > Reporter: Bryan Ashby > Priority: Major > Attachments: record-batch-large.arrow > > > I'm running into various exceptions (often: "Invalid flatbuffers message") > when attempting to de-serialize RecordBatch's in Python that were generated > in C++. > The same batch can be de-serialized back within C++. > *Example (C++)* (status checks omitted, but they are check in real code)*:* > {code:java} > const auto stream = arrow::io::BufferOutputStream::Create(); > { > const auto writer = arrow::ipc::MakeStreamWriter(*stream, schema); > sdk::MaybeThrowError(writer); > const auto writeRes = (*writer)->WriteRecordBatch(batch); > sdk::MaybeThrowError((*writer)->Close()); > } > auto buffer = (*stream)->Finish();std::ofstream > ofs("record-batch-large.arrow"); // we'll read this in Python > ofs.write(reinterpret_cast<const char*>((*buffer)->data()), > (*buffer)->size()); > ofs.close();auto backAgain = DeserializeRecordBatch((*buffer)); // all good > {code} > *Then in Python*: > {code:java} > with open("record-batch-large.arrow", "rb") as f: > data = f.read()reader = pa.RecordBatchStreamReader(data) // throws here > - "Invalid flatbuffers message" > {code} > Please see the attached .arrow file (produced above). > Any ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)