It's a bit more configurable, but basically yes.  See the IPC write options:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/options.h#L73

Regards

Antoine.


Le 18/03/2021 à 16:37, Jacob Quinn a écrit :
Ah, interesting. So to make sure I understand correctly, the C++ write
implementation will scan all "batches" and unify all dictionary values
before writing out the schema + dictionary messages? But only when writing
the file format? In the streaming case, it would still write
replacement/delta dictionary messages as needed.

-Jacob

On Thu, Mar 18, 2021 at 9:10 AM Neal Richardson <neal.p.richard...@gmail.com>
wrote:

Somewhat related issue: https://issues.apache.org/jira/browse/ARROW-10406

On Wed, Mar 17, 2021 at 11:22 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

BTW, this nuance always felt a little strange to me, but would have
required adding additional information to the file format, to
disambiguate
when exactly a dictionary was intended to be replaced.

On Wed, Mar 17, 2021 at 11:19 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

Hi Jacob,
There is nuance.  The file format does not support dictionary
replacement,
the specification [1] why that is currently the case.  Only the "stream
format" supports replacement (i.e. no magic number, only schema
followed
by
one or more dictionary/record-batch messages).

-Micah

[1] https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format

On Wed, Mar 17, 2021 at 11:04 PM Jacob Quinn <quinn.jac...@gmail.com>
wrote:

Had an issue come up here:

https://github.com/JuliaData/Arrow.jl/issues/129#issuecomment-777350450
.
 From the implementation status page, it says C++ supports replacement
dictionaries and that python tracks the C++ implementation. Is this
just a
pyarrow issue where it specifically doesn't support replacement
dictionaries? Or it's not "hooked in" properly?

-Jacob





Reply via email to