Per my comments on the pr, I also think this is preferred. I believe we will avoid the potential for validity inconsistency and simplify construction of union data in most cases.
On Wed, Jun 24, 2020, 7:58 AM Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > As discussed on the recent GitHub PR [1], as a means of reconciling > the long-standing cross-implementation incompatibilities with Union > types, it's been proposed to remove the top-level validity bitmap from > the Union data layout and let validity be determined exclusively by > the child arrays of the union. So the only additional data needed to > form a union are the type ids (and for the dense union, the offsets). > > I do not think this change meaningfully alters the semantics of Union > types and I think it also simplifies their construction, so I would be > in favor of making it for 1.0.0. > > I can create a PR with the relevant alterations but wanted to raise > the issue now so if there is consensus about doing this, that we can > act quickly to implement it. > > Thanks, > Wes > > [1]: https://github.com/apache/arrow/pull/7290 >