Thanks for all the answers. The assumptions about union types in C++ code are fixed in https://github.com/apache/arrow/pull/5892
Regards Antoine. Le 25/11/2019 à 16:41, Wes McKinney a écrit : > On Mon, Nov 25, 2019 at 9:25 AM Antoine Pitrou <solip...@pitrou.net> wrote: >> >> On Mon, 25 Nov 2019 09:12:21 -0600 >> Wes McKinney <wesmck...@gmail.com> wrote: >>> On Mon, Nov 25, 2019 at 8:52 AM Antoine Pitrou <anto...@python.org> wrote: >>>> >>>> >>>> Hello, >>>> >>>> The spec has the following language about union type ids: >>>> """ >>>> Types buffer: A buffer of 8-bit signed integers. Each type in the union >>>> has a corresponding type id whose values are found in this buffer. A >>>> union with more than 127 possible types can be modeled as a union of >>>> unions. >>>> """ >>>> https://arrow.apache.org/docs/format/Columnar.html#union-layout >>>> >>>> However, in several places the C++ code assumes type ids are unsigned. >>>> Java doesn't seem to implement type ids (and there is no integration >>>> task for union types). >>>> >>>> In the flatbuffers description, the type ids array is modeled as an >>>> array of signed 32-bit integers. >>>> >>>> Moreover, according to the language above, type ids should be restricted >>>> to the [0, 127] interval? Which one should it be? >>> >>> The (optional) type ids in the metadata provide a correspondence >>> between the union types / children and the values found in the types >>> buffer (data). As stated in the spec, the types buffer are 8-bit >>> signed integers. As I recall the reason that we used [ Int ] in the >>> metadata was that the Int type is thought to be easier for languages >>> to work with in general when serializing/deserializing the metadata. >> >> Ok, but is there a reason the C++ code uses `std::vector<uint8_t>` for >> the type codes? > > Oversight on my part. Suggest we change to int8_t > >> Regards >> >> Antoine. >> >>