Hi all,

OmniSci (formerly MapD) has been a long time user of Arrow for IPC
serialization and mem sharing of query results, primarily through our
python connector. We recently upgraded from Arrow 0.13 to Arrow 0.16.
This required us to change our Arrow conversion routines to handle the
new DictionaryMemo for serializing dictionaries. For CPU, this was
fairly easy as I was able to just write the record batch stream using
`arrow::ipc::WriteRecordBatchStream` (and read it using
`RecordBatchStreamReader` on the client). For GPU/CUDA, however, I did
not see a way to serialize the dictionary alongside the CUDA data and
wrap that in a single "object" (the semantics of which probably need
to be broken down, which I will do in a second). So, I came up with
our own: 
https://github.com/omnisci/omniscidb/blob/4ab6622bd0ee15e478bff4263f083ab761fc965c/QueryEngine/ArrowResultSetConverter.cpp#L219

Essentially, I assemble a RecordBatch with the dictionaries I want to
serialize and call WriteRecordBatchStream to serialize into a CPU IPC
stream, which I copy to CPU shared memory. I then serialize the GPU
record batch using SerializeRecordBatch into a CUDABuffer. The
CudaBuffer is exported for IPC sharing, and I send both memory handles
(CPU and GPU) over to the client. The client then has to read the
RecordBatch containing the dictionaries and place the dictionaries
into a DictionaryMemo, which is used to read the record batches from
GPU. The process of building the DictionaryMemo on the client is here:
https://github.com/omnisci/omniscidb/blob/master/Tests/ArrowIpcIntegrationTest.cpp#L380

This seems to work ok, at least for C++, but I am interested in making
it more compact and possibly contributing some or all to mainline
Arrow. Therefore, I have two questions:
1) Does this look like a reasonable way to go about handling a
serialized RecordBatch in CUDA (that is, separate the dictionaries and
return two objects, or a single object holding two handles)?
2) Is this something that the Arrow community would be interested in
seeing contributed in whatever form we agree upon for (1)?

Thanks,
Alex

Reply via email to