As another item for consideration -- in C++ at least, the dictionary id is dealt with as an internal detail of the IPC message production process. When serializing the Schema, id's are assigned to each dictionary-encoded field in the DictionaryMemo object, see
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/dictionary.h When record batches are reconstructed, the dictionary corresponding to an id at the time of reconstruction is set in the Array's internal data -- that's the "dictionary" member of the ArrayData object (https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L231). On Tue, Apr 7, 2020 at 1:22 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hey Paul, > > Take a look at how dictionaries work in the IPC protocol > > https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#serialization-and-interprocess-communication-ipc > > Dictionaries are sent as separate messages. When a field is tagged as > dictionary encoded in the schema, the IPC reader must keep track of > the dictionaries it's seen come across the protocol and then set them > in the reconstructed record batches when a record batch comes through. > > Note that the protocol now supports dictionary deltas (dictionaries > can be appended to by subsequent messages for the same dictionary id) > and replacements (new dictionary for an id). > > I don't know what the status of handling dictionaries in the Rust IPC, > but it would be a good idea to take time to take into account the > above details. > > Finally, note that Rust is not participating in either the regular IPC > nor Flight integration tests. This is an important milestone to being > able to depend on the Rust library in production. > > Thanks > Wes > > On Tue, Apr 7, 2020 at 10:36 AM Paul Dix <p...@influxdata.com> wrote: > > > > Hello, > > I'm trying to build a Rust based Flight server and I'd like to use > > Dictionary encoding for a number of string columns in my data. I've seen > > that StringDictionary was recently added to Rust here: > > https://github.com/apache/arrow/commit/c7a7d2dcc46ed06593b994cb54c5eaf9ccd1d21d#diff-72812e30873455dcee2ce2d1ee26e4ab. > > > > However, that doesn't seem to reach down into Flight. When I attempt to > > send a schema through flight that has a Dictionary<UInt8, Utf8> it throws > > an error when attempting to convert from the Rust type to the Flatbuffer > > field type. I figured I'd take a swing at adding that to convert.rs here: > > https://github.com/apache/arrow/blob/master/rust/arrow/src/ipc/convert.rs#L319 > > > > However, when I look at the definitions in Schema.fbs and the related > > generated Rust file, Dictionary isn't a type there. Should I be sending > > this down as some other composed type? And if so, how does this look at the > > client side of things? In my test I'm connecting to the Flight server via > > PyArrow and working with it in Pandas so I'm hoping that it will be able to > > consume Dictionary fields. > > > > Separately, the Rust field type doesn't have a spot for the dictionary ID, > > which I assume I'll need to send down so it can be consumed on the client. > > Would appreciate any thoughts on that. A little push in the right direction > > and I'll be happy to submit a PR to help push the Rust Flight > > implementation farther along. > > > > Thanks, > > Paul