This is an interesting question. IMO, to support repeated values, we also need to design a "coherency protocol", to avoid the scenario where once a value is witten, the change is propagated to another slot unexpectedly.
Best, Liya Fan On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hmm, I also thought the intention was monotonically increasing. I can't > think of a strong reason one way or another. If the argument about code to > do random access is the same in all cases, is there any benefit to forcing > any order at all? Memory prefetching? > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Antoine, > > > > It's a good question. > > > > The intent when we wrote the specification was to be strictly > > monotonic, but there seems nothing especially harmful about relaxing > > the constraint to allow for repeated values or even non-monotonicity > > (strict or otherwise). For example, if we had the union > > > > ['a', 'a', 'a', 0, 1, 'b', 'b'] > > > > then this could be represented as > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > offsets: [0, 0, 0, 0, 1, 1, 1] > > child[0]: ['a', 'b'] > > child[1]: [0, 1] > > > > or > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > offsets: [1, 1, 1, 0, 1, 0, 0] > > child[0]: ['b', 'a'] > > child[1]: [0, 1] > > > > What do others think? Either way some clarification in the > > specification would be useful. Because the code used to do random > > access is the same in all cases, I feel weakly supportive of removing > > constraints on the offsets. > > > > - Wes > > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > > > > Hello, > > > > > > I'd like some clarification on the spec and intent for dense arrays. > > > > > > Currently, it is specified that offsets of a dense union are "in order > / > > > increasing" (*). However, it is not obvious whether repeated values > are > > > allowed or not. > > > > > > I suspect the intent is to avoid having people exploit unions as some > > > kind of poor man's dictionaries. Also, perhaps some optimizations are > > > possible if monotonic or strictly monotonic indices are assumed? But I > > > don't know the history behind the union type. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > (*) https://arrow.apache.org/docs/format/Columnar.html#dense-union > > >