hey Paul,

Take a look at how dictionaries work in the IPC protocol

https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#serialization-and-interprocess-communication-ipc

Dictionaries are sent as separate messages. When a field is tagged as
dictionary encoded in the schema, the IPC reader must keep track of
the dictionaries it's seen come across the protocol and then set them
in the reconstructed record batches when a record batch comes through.

Note that the protocol now supports dictionary deltas (dictionaries
can be appended to by subsequent messages for the same dictionary id)
and replacements (new dictionary for an id).

I don't know what the status of handling dictionaries in the Rust IPC,
but it would be a good idea to take time to take into account the
above details.

Finally, note that Rust is not participating in either the regular IPC
nor Flight integration tests. This is an important milestone to being
able to depend on the Rust library in production.

Thanks
Wes

On Tue, Apr 7, 2020 at 10:36 AM Paul Dix <p...@influxdata.com> wrote:
>
> Hello,
> I'm trying to build a Rust based Flight server and I'd like to use
> Dictionary encoding for a number of string columns in my data. I've seen
> that StringDictionary was recently added to Rust here:
> https://github.com/apache/arrow/commit/c7a7d2dcc46ed06593b994cb54c5eaf9ccd1d21d#diff-72812e30873455dcee2ce2d1ee26e4ab.
>
> However, that doesn't seem to reach down into Flight. When I attempt to
> send a schema through flight that has a Dictionary<UInt8, Utf8> it throws
> an error when attempting to convert from the Rust type to the Flatbuffer
> field type. I figured I'd take a swing at adding that to convert.rs here:
> https://github.com/apache/arrow/blob/master/rust/arrow/src/ipc/convert.rs#L319
>
> However, when I look at the definitions in Schema.fbs and the related
> generated Rust file, Dictionary isn't a type there. Should I be sending
> this down as some other composed type? And if so, how does this look at the
> client side of things? In my test I'm connecting to the Flight server via
> PyArrow and working with it in Pandas so I'm hoping that it will be able to
> consume Dictionary fields.
>
> Separately, the Rust field type doesn't have a spot for the dictionary ID,
> which I assume I'll need to send down so it can be consumed on the client.
> Would appreciate any thoughts on that. A little push in the right direction
> and I'll be happy to submit a PR to help push the Rust Flight
> implementation farther along.
>
> Thanks,
> Paul

Reply via email to