This is an interesting question.
IMO, to support repeated values, we also need to design a "coherency
protocol", to avoid the scenario where once a value is witten, the change
is propagated to another slot unexpectedly.

Best,
Liya Fan

On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hmm, I also thought the intention was monotonically increasing. I can't
> think of a strong reason one way or another. If the argument about code to
> do random access is the same in all cases, is there any benefit to forcing
> any order at all?  Memory prefetching?
>
> On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Antoine,
> >
> > It's a good question.
> >
> > The intent when we wrote the specification was to be strictly
> > monotonic, but there seems nothing especially harmful about relaxing
> > the constraint to allow for repeated values or even non-monotonicity
> > (strict or otherwise). For example, if we had the union
> >
> > ['a', 'a', 'a', 0, 1, 'b', 'b']
> >
> > then this could be represented as
> >
> > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > offsets: [0, 0, 0, 0, 1, 1, 1]
> > child[0]: ['a', 'b']
> > child[1]: [0, 1]
> >
> > or
> >
> > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > offsets: [1, 1, 1, 0, 1, 0, 0]
> > child[0]: ['b', 'a']
> > child[1]: [0, 1]
> >
> > What do others think? Either way some clarification in the
> > specification would be useful. Because the code used to do random
> > access is the same in all cases, I feel weakly supportive of removing
> > constraints on the offsets.
> >
> > - Wes
> >
> > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou <anto...@python.org>
> wrote:
> > >
> > >
> > > Hello,
> > >
> > > I'd like some clarification on the spec and intent for dense arrays.
> > >
> > > Currently, it is specified that offsets of a dense union are "in order
> /
> > > increasing" (*).  However, it is not obvious whether repeated values
> are
> > > allowed or not.
> > >
> > > I suspect the intent is to avoid having people exploit unions as some
> > > kind of poor man's dictionaries.  Also, perhaps some optimizations are
> > > possible if monotonic or strictly monotonic indices are assumed?  But I
> > > don't know the history behind the union type.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > (*) https://arrow.apache.org/docs/format/Columnar.html#dense-union
> >
>

Reply via email to