Ah, interesting. So to make sure I understand correctly, the C++ write implementation will scan all "batches" and unify all dictionary values before writing out the schema + dictionary messages? But only when writing the file format? In the streaming case, it would still write replacement/delta dictionary messages as needed.
-Jacob On Thu, Mar 18, 2021 at 9:10 AM Neal Richardson <neal.p.richard...@gmail.com> wrote: > Somewhat related issue: https://issues.apache.org/jira/browse/ARROW-10406 > > On Wed, Mar 17, 2021 at 11:22 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > BTW, this nuance always felt a little strange to me, but would have > > required adding additional information to the file format, to > disambiguate > > when exactly a dictionary was intended to be replaced. > > > > On Wed, Mar 17, 2021 at 11:19 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > Hi Jacob, > > > There is nuance. The file format does not support dictionary > > replacement, > > > the specification [1] why that is currently the case. Only the "stream > > > format" supports replacement (i.e. no magic number, only schema > followed > > by > > > one or more dictionary/record-batch messages). > > > > > > -Micah > > > > > > [1] https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format > > > > > > On Wed, Mar 17, 2021 at 11:04 PM Jacob Quinn <quinn.jac...@gmail.com> > > > wrote: > > > > > >> Had an issue come up here: > > >> > https://github.com/JuliaData/Arrow.jl/issues/129#issuecomment-777350450 > > . > > >> From the implementation status page, it says C++ supports replacement > > >> dictionaries and that python tracks the C++ implementation. Is this > > just a > > >> pyarrow issue where it specifically doesn't support replacement > > >> dictionaries? Or it's not "hooked in" properly? > > >> > > >> -Jacob > > >> > > > > > >