hi Suhail -- well, unnesting produces an array of a different length.
I would think that unnesting would mainly occur in the context of
analytics, e.g.

list_values.flatten().unique()

We definitely would like to have APIs that help with doing analytics
on nested data. I had hoped to get to work on the DataFrames API in
C++ this year, but there have been other more pressing projects and
issues related to maintaining and scaling up the Arrow community so it
looks more likely a project for 2020.

- Wes

On Thu, Sep 26, 2019 at 6:09 AM Suhail Razzak <suhail.raz...@gmail.com> wrote:
>
> Thanks Wes, makes sense. I appreciate that there are use cases where both
> could be applicable.
>
> In my example, the most applicable I can think of is unnesting a ListArray
> column for a DataFrame (in the future C++ DataFrames API?) similar to the
> tidyr unnest function. I don't believe the current implementation wouldn't
> be able to align the flattened ListArray with the rest of the columns. I'll
> see if there's something I can do on this end.
>
> On Wed, Sep 25, 2019 at 6:27 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Suhail,
> >
> > This follows the columnar format closely. The List layout is composed
> > from a child array providing the "inner" values, which are given the
> > List<T> interpretation by adding an offsets buffer, and a validity
> > buffer to distinguish null from 0-length list values. So flatten()
> > here just returns the child array, which has only 3 values in the
> > example you gave.
> >
> > A function could be written to insert "null" for List values that are
> > null, but someone would have to write it and give it a name =)
> >
> > - Wes
> >
> > On Wed, Sep 25, 2019 at 5:15 PM Suhail Razzak <suhail.raz...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm working through a certain use case where I'm unnesting ListArrays,
> > but
> > > I noticed something peculiar - null ListValues are not retained in the
> > > unnested array.
> > >
> > > E.g.
> > > In [0]: arr = pa.array([[0, 1], [0], None, None])
> > > In [1]: arr.flatten()
> > > Out [1]: [0, 1, 0]
> > >
> > > While I would have expected [0, 1, 0, null, null].
> > >
> > > I should note that this works if the None is encapsulated in a list. So
> > I'm
> > > guessing this is expected logic and if so, what's the reasoning for that?
> > >
> > > Thanks,
> > > Suhail
> >

Reply via email to