Ah, interesting. So to make sure I understand correctly, the C++ write
implementation will scan all "batches" and unify all dictionary values
before writing out the schema + dictionary messages? But only when writing
the file format? In the streaming case, it would still write
replacement/delta dictionary messages as needed.

-Jacob

On Thu, Mar 18, 2021 at 9:10 AM Neal Richardson <neal.p.richard...@gmail.com>
wrote:

> Somewhat related issue: https://issues.apache.org/jira/browse/ARROW-10406
>
> On Wed, Mar 17, 2021 at 11:22 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > BTW, this nuance always felt a little strange to me, but would have
> > required adding additional information to the file format, to
> disambiguate
> > when exactly a dictionary was intended to be replaced.
> >
> > On Wed, Mar 17, 2021 at 11:19 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > Hi Jacob,
> > > There is nuance.  The file format does not support dictionary
> > replacement,
> > > the specification [1] why that is currently the case.  Only the "stream
> > > format" supports replacement (i.e. no magic number, only schema
> followed
> > by
> > > one or more dictionary/record-batch messages).
> > >
> > > -Micah
> > >
> > > [1] https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format
> > >
> > > On Wed, Mar 17, 2021 at 11:04 PM Jacob Quinn <quinn.jac...@gmail.com>
> > > wrote:
> > >
> > >> Had an issue come up here:
> > >>
> https://github.com/JuliaData/Arrow.jl/issues/129#issuecomment-777350450
> > .
> > >> From the implementation status page, it says C++ supports replacement
> > >> dictionaries and that python tracks the C++ implementation. Is this
> > just a
> > >> pyarrow issue where it specifically doesn't support replacement
> > >> dictionaries? Or it's not "hooked in" properly?
> > >>
> > >> -Jacob
> > >>
> > >
> >
>

Reply via email to