Glad to see someone is interested in dictionary deltas!

The Javascript implementation does handle deltas, but we only have an arrow reader implementation at the moment, which can handle deltas pretty trivially (here's the relevant line in the JS IPC reader: https://github.com/apache/arrow/blob/master/js/src/ipc/reader/vector.ts#L56). I haven't put any thought into what the writer API for deltas should look like - Paul Taylor has been working on a JS writer so he may have some thoughts, but I'm not sure.

If you're only interested in deltas so that you don't have to collect every distinct value before you can start sending data you could also consider using the file format (https://github.com/apache/arrow/blob/master/format/IPC.md#file-format). When using the file format, it's perfectly fine to just send your dictionary batches at the end of the message, after sending record batches, since it's intended for random access. So if it's ok for your reader to not have knowledge of the dictionary values until it's received all the data, that may work for you.

Brian


On 02/05/2018 04:10 PM, Wes McKinney wrote:
hi Dimitri,

No one is working on it yet in C++, nor have we worked on any API
design sketches. I think there may be some work in JavaScript.

Please feel free to open some JIRAs and propose APIs / behavior or
work on an implementation.

Thanks,
Wes

On Mon, Feb 5, 2018 at 11:37 AM, Dimitri Vorona <alen...@googlemail.com> wrote:
Hi,

ARROW-1727 added format support for delta dictionaries. It makes possible
to interleave record batches which contain dictionary encoded field with
delta dictionary batches which add new dictionary entries.

As far as I can see there is not implementation of this feature in cpp,
yet. Is anyone working on it right now? Are there any ideas what the API
should look like?

Cheers,
Dimitri.

Reply via email to