Le 27/08/2019 à 22:31, Wes McKinney a écrit :
> So the current situation we have right now in C++ is that if we tried
> to create an IPC stream from a sequence of record batches that don't
> all have the same dictionary, we'd run into two scenarios:
> 
> * Batches that either have a prefix of a prior-observed dictionary, or
> the prior dictionary is a prefix of their dictionary. For example,
> suppose that the dictionary sent for an id was ['A', 'B', 'C'] and
> then there's a subsequent batch with ['A', 'B', 'C', 'D', 'E']. In
> such case we could compute and send a delta batch
> 
> * Batches with a dictionary that is a permutation of values, and
> possibly new unique values.
> 
> In this latter case, without the option of replacing an existing ID in
> the stream, we would have to do a unification / permutation of indices
> and then also possibly send a delta batch. We should probably have
> code at some point that deals with both cases, but in the meantime I
> would like to allow dictionaries to be redefined in this case. Seems
> like we might need a vote to formalize this?

Isn't the stream format deviating from the file format then?  In the
file format, IIUC, dictionaries can appear after the respective record
batches, so there's no way to tell whether the original or redefined
version of a dictionary is being referred to.

Regards

Antoine.

Reply via email to