hi Suhail -- well, unnesting produces an array of a different length. I would think that unnesting would mainly occur in the context of analytics, e.g.
list_values.flatten().unique() We definitely would like to have APIs that help with doing analytics on nested data. I had hoped to get to work on the DataFrames API in C++ this year, but there have been other more pressing projects and issues related to maintaining and scaling up the Arrow community so it looks more likely a project for 2020. - Wes On Thu, Sep 26, 2019 at 6:09 AM Suhail Razzak <suhail.raz...@gmail.com> wrote: > > Thanks Wes, makes sense. I appreciate that there are use cases where both > could be applicable. > > In my example, the most applicable I can think of is unnesting a ListArray > column for a DataFrame (in the future C++ DataFrames API?) similar to the > tidyr unnest function. I don't believe the current implementation wouldn't > be able to align the flattened ListArray with the rest of the columns. I'll > see if there's something I can do on this end. > > On Wed, Sep 25, 2019 at 6:27 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Suhail, > > > > This follows the columnar format closely. The List layout is composed > > from a child array providing the "inner" values, which are given the > > List<T> interpretation by adding an offsets buffer, and a validity > > buffer to distinguish null from 0-length list values. So flatten() > > here just returns the child array, which has only 3 values in the > > example you gave. > > > > A function could be written to insert "null" for List values that are > > null, but someone would have to write it and give it a name =) > > > > - Wes > > > > On Wed, Sep 25, 2019 at 5:15 PM Suhail Razzak <suhail.raz...@gmail.com> > > wrote: > > > > > > Hi, > > > > > > I'm working through a certain use case where I'm unnesting ListArrays, > > but > > > I noticed something peculiar - null ListValues are not retained in the > > > unnested array. > > > > > > E.g. > > > In [0]: arr = pa.array([[0, 1], [0], None, None]) > > > In [1]: arr.flatten() > > > Out [1]: [0, 1, 0] > > > > > > While I would have expected [0, 1, 0, null, null]. > > > > > > I should note that this works if the None is encapsulated in a list. So > > I'm > > > guessing this is expected logic and if so, what's the reasoning for that? > > > > > > Thanks, > > > Suhail > >