[ https://issues.apache.org/jira/browse/ARROW-12100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou reassigned ARROW-12100: -------------------------------------- Assignee: Antoine Pitrou > [C#] Cannot round-trip record batch with PyArrow > ------------------------------------------------ > > Key: ARROW-12100 > URL: https://issues.apache.org/jira/browse/ARROW-12100 > Project: Apache Arrow > Issue Type: Bug > Components: C#, C++, Python > Affects Versions: 3.0.0 > Reporter: Tanguy Fautre > Assignee: Antoine Pitrou > Priority: Blocker > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: ArrowSharedMemory_20210326.zip, > ArrowSharedMemory_20210326_2.zip, ArrowSharedMemory_20210329.zip > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Has anyone ever tried to round-trip a record batch between Arrow C# and > PyArrow? I can't get PyArrow to read the data correctly. > For context, I'm trying to do Arrow data-frames inter-process communication > between C# and Python using shared memory (local TCP/IP is also an > alternative). Ideally, I wouldn't even have to serialise the data and could > just share the Arrow in-memory representation directly, but I'm not sure this > is even possible with Apache Arrow. Full source code as attachment. > *C#* > {code:c#} > using (var stream = sharedMemory.CreateStream(0, 0, > MemoryMappedFileAccess.ReadWrite)) > { > var recordBatch = /* ... */ > using (var writer = new ArrowFileWriter(stream, recordBatch.Schema, > leaveOpen: true)) > { > writer.WriteRecordBatch(recordBatch); > writer.WriteEnd(); > } > } > {code} > *Python* > {code:python} > shmem = open_shared_memory(args) > address = get_shared_memory_address(shmem) > buf = pa.foreign_buffer(address, args.sharedMemorySize) > stream = pa.input_stream(buf) > reader = pa.ipc.open_stream(stream) > {code} > Unfortunately, it fails with the following error: {{pyarrow.lib.ArrowInvalid: > Expected to read 1330795073 metadata bytes, but only read 1230}}. > I can see that the memory content starts with > {{ARROW1\x00\x00\xff\xff\xff\xff\x08\x01\x00\x00\x10\x00\x00\x00}}. It seems > that using the API calls above, PyArrow reads "ARRO" as the length of the > metadata. > I assume I'm using the API incorrectly. Has anyone got a working example? -- This message was sent by Atlassian Jira (v8.3.4#803005)