Hi Wes, Thanks for the reply. I scanned through JIRA but it didn't look like this was filed or anyone was working on it, so I filed https://issues.apache.org/jira/browse/ARROW-8927. I have a branch and things seem to look pretty good, was able to duplicate TestCudaArrowIpc_BasicWriteRead but provide a record batch with dictionaries.
Alex On Thu, Apr 16, 2020 at 12:51 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Alex, > > I haven't looked at the details of your code, but having APIs that > "collapse" the process of writing a single record batch along with its > dictionaries as a sequence of end-to-end IPC messages (and then having > a function to reverse that process to reconstruct the record batch) > and making that work for writing to GPU memory (using the new device > API) seems reasonable to me. There's a bit of refactoring that would > need to take place to be able to reuse certain code paths relating to > dictionary batch handling. Note also that we're due to implement delta > dictionaries and dictionary replacements so we might want to take all > of these needs into account to reduce the amount of code churn that > takes place. > > - Wes > > On Thu, Apr 16, 2020 at 1:44 PM Alex Baden <alex.ba...@omnisci.com> wrote: > > > > Hi all, > > > > OmniSci (formerly MapD) has been a long time user of Arrow for IPC > > serialization and mem sharing of query results, primarily through our > > python connector. We recently upgraded from Arrow 0.13 to Arrow 0.16. > > This required us to change our Arrow conversion routines to handle the > > new DictionaryMemo for serializing dictionaries. For CPU, this was > > fairly easy as I was able to just write the record batch stream using > > `arrow::ipc::WriteRecordBatchStream` (and read it using > > `RecordBatchStreamReader` on the client). For GPU/CUDA, however, I did > > not see a way to serialize the dictionary alongside the CUDA data and > > wrap that in a single "object" (the semantics of which probably need > > to be broken down, which I will do in a second). So, I came up with > > our own: > > https://github.com/omnisci/omniscidb/blob/4ab6622bd0ee15e478bff4263f083ab761fc965c/QueryEngine/ArrowResultSetConverter.cpp#L219 > > > > Essentially, I assemble a RecordBatch with the dictionaries I want to > > serialize and call WriteRecordBatchStream to serialize into a CPU IPC > > stream, which I copy to CPU shared memory. I then serialize the GPU > > record batch using SerializeRecordBatch into a CUDABuffer. The > > CudaBuffer is exported for IPC sharing, and I send both memory handles > > (CPU and GPU) over to the client. The client then has to read the > > RecordBatch containing the dictionaries and place the dictionaries > > into a DictionaryMemo, which is used to read the record batches from > > GPU. The process of building the DictionaryMemo on the client is here: > > https://github.com/omnisci/omniscidb/blob/master/Tests/ArrowIpcIntegrationTest.cpp#L380 > > > > This seems to work ok, at least for C++, but I am interested in making > > it more compact and possibly contributing some or all to mainline > > Arrow. Therefore, I have two questions: > > 1) Does this look like a reasonable way to go about handling a > > serialized RecordBatch in CUDA (that is, separate the dictionaries and > > return two objects, or a single object holding two handles)? > > 2) Is this something that the Arrow community would be interested in > > seeing contributed in whatever form we agree upon for (1)? > > > > Thanks, > > Alex