Thanks a lot for the confirmation, Micah.

It still seems possible to remove the typeid without loss of generality
(not that I am advocating for), as all fields are declared as children of
the field, and it is thus possible to declare fields that the union does
not currently contain in the metadata, which are mapped to empty ArrayData
children (in dense).

A minor annoyance is that the typeid does not allow a zero-copy over the C
data interface, as we need to initialize an intermediary hashmap to make
the lookup fast, or take the penalty hit to search (linearly over the
number of fields) over the typeids for the type id.

Best,
Jorge




On Fri, Aug 13, 2021 at 7:07 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Jorge,
> I think your analysis is correct.  Some historical context on why there is
> an indication  is covered on the original JIRA:
> https://issues.apache.org/jira/browse/ARROW-257
>
> Some other discussions:
>
> https://lists.apache.org/x/thread.html/75028183d54cb4f6ff588b043fe126f10b2cba8e373673fad6ba889d@%3Cdev.arrow.apache.org%3E
>
> https://lists.apache.org/x/thread.html/b219ef51dda71bef83dcdec94e68e2881d49f751b29a8c1251f653d5@%3Cdev.arrow.apache.org%3E
>
> -Micah
>
> On Fri, Aug 13, 2021 at 10:57 AM Keith Kraus <keith.j.kr...@gmail.com>
> wrote:
>
> > How would using the typeid directly work with arbitrary Extension types?
> >
> > -Keith
> >
> > On Fri, Aug 13, 2021 at 12:49 PM Jorge Cardoso Leitão <
> > jorgecarlei...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > In the UnionArray, there is a level of indirection between types
> (buffer
> > of
> > > i8s) -> typeId (i8) -> field. For example, the generated_union part of
> > our
> > > integration tests has the data:
> > >
> > > types: [5, 5, 5, 5, 7, 7, 7, 7, 5, 5, 7] (len = 11)
> > > typeids: [5, 7]
> > > fields: [int32, utf8]
> > >
> > > My understanding is that, to get the field of item 4, we read types[4]
> > (7),
> > > look for the index of it in typeids (1), and take the field of index 1
> > > (utf8), and then read the value (4 or other depending on sparsess).
> > >
> > > Does someone know the rationale for the intermediare typeid? I.e.
> > couldn't
> > > the types contain the index of the field directly [0, 0, 0, 0, 1, 1, 1,
> > 1,
> > > 0, 0,1] (replace 5 by 0, 7 by 1, and not use typeids)?
> > >
> > > Best,
> > > Jorge
> > >
> >
>

Reply via email to