Thanks for all the answers.  The assumptions about union types in C++
code are fixed in https://github.com/apache/arrow/pull/5892

Regards

Antoine.


Le 25/11/2019 à 16:41, Wes McKinney a écrit :
> On Mon, Nov 25, 2019 at 9:25 AM Antoine Pitrou <solip...@pitrou.net> wrote:
>>
>> On Mon, 25 Nov 2019 09:12:21 -0600
>> Wes McKinney <wesmck...@gmail.com> wrote:
>>> On Mon, Nov 25, 2019 at 8:52 AM Antoine Pitrou <anto...@python.org> wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> The spec has the following language about union type ids:
>>>> """
>>>> Types buffer: A buffer of 8-bit signed integers. Each type in the union
>>>> has a corresponding type id whose values are found in this buffer. A
>>>> union with more than 127 possible types can be modeled as a union of 
>>>> unions.
>>>> """
>>>> https://arrow.apache.org/docs/format/Columnar.html#union-layout
>>>>
>>>> However, in several places the C++ code assumes type ids are unsigned.
>>>> Java doesn't seem to implement type ids (and there is no integration
>>>> task for union types).
>>>>
>>>> In the flatbuffers description, the type ids array is modeled as an
>>>> array of signed 32-bit integers.
>>>>
>>>> Moreover, according to the language above, type ids should be restricted
>>>> to the [0, 127] interval?  Which one should it be?
>>>
>>> The (optional) type ids in the metadata provide a correspondence
>>> between the union types / children and the values found in the types
>>> buffer (data). As stated in the spec, the types buffer are 8-bit
>>> signed integers. As I recall the reason that we used [ Int ] in the
>>> metadata was that the Int type is thought to be easier for languages
>>> to work with in general when serializing/deserializing the metadata.
>>
>> Ok, but is there a reason the C++ code uses `std::vector<uint8_t>` for
>> the type codes?
> 
> Oversight on my part. Suggest we change to int8_t
> 
>> Regards
>>
>> Antoine.
>>
>>

Reply via email to