Hi Paul,
TL;DR; I think the the typeIds field you referenced is not the offset for
dense vectors mentioned by the spec.  I believe (but lack the historical
context) that it is an outgrowth of the Java implementation that might be
useful in other contexts.

The requirement is that typeIDs field you referenced is that  has a less
length less the 127, the bit-width of the ID is immaterial.  Also, the
typeIDs field and unions aren't fully supported yet.  There is an open PR
[1] which got stalled on performance and long term direction concerns.

I haven't fully validated this, but my rough understanding is that the Java
implementation assumes only one array/vector of each type is in a union.
Roughly, each logical type + Schema.fbs enum parameterization has its own
type with its own type ID (I think the number is still less 127 but might
grow larger).  The implementation makes use of this fact to do some
optimizations.  So when a union (I think only Sparse is supported in Java)
serializes itself it records each of the type IDs [2] so it can easily map
back to them.

[1] https://github.com/apache/arrow/pull/987
[2]
https://github.com/apache/arrow/blob/73d379f4631cd3013371f60876a52615171e6c3b/java/vector/src/main/codegen/templates/UnionVector.java#L329

On Wed, Mar 20, 2019 at 1:08 AM Paul Taylor <[email protected]> wrote:

> I noticed the the DenseUnion docs[1] says the typeIds buffer is 8-bit
> signed integers, but in the flatbuffer schema[2] it's typed as int (and
> flatc generates a function that returns an Int32Array).
>
> How are the other implementations treating this buffer, and should we
> update the docs or the flatbuffers schema?
>
> Thanks,
>
> Paul
>
> 1. https://arrow.apache.org/docs/format/Layout.html#dense-union-type
>
> 2.
>
> https://github.com/apache/arrow/blob/50bc9f49671afb56594910f49b9bf34e080a70e7/format/Schema.fbs#L92
>
>

Reply via email to