Re: Join operation on attributes from arrow structs

Francois Saint-Jacques Thu, 02 Apr 2020 05:12:51 -0700

They're mapped with the StructType/StructArray, which is also columnar
representation, e.g. one buffer per field in the sub-object. If you
have varying/incompatible types, a field will be promoted to a
UnionType.


François

On Thu, Apr 2, 2020 at 12:54 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> Hi Hasara,
> There isn't current functionality in C++/Python to do this (
> https://issues.apache.org/jira/browse/ARROW-4630 is the issue tracking
> this).
>
> Also how nested attributes in json format are mapped into buffers once
> > converted in arrow format?
>
> I'm not sure I understand this question?
>
> Thanks,
> Micah
>
> On Sun, Mar 22, 2020 at 10:09 PM Hasara Maithree <
> hasaramaithreedesi...@gmail.com> wrote:
>
> > Hi all,
> >
> > Assume I have a json file named 'my_data.json' as below.
> >
> > *{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}}
> > {"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"**}}*
> >
> > If I need to do a join operation based on attribute d, can I do it
> > directly from arrow structs? ( or are there any efficient alternatives?)
> > Also how nested attributes in json format are mapped into buffers once
> > converted in arrow format? (example taken from documentation)
> >
> > >>> table = json.read_json("my_data.json")>>> table
> > pyarrow.Table
> > a: list<item: int64>
> >   child 0, item: int64
> > b: struct<c: bool, d: timestamp[s]>
> >   child 0, c: bool
> >   child 1, d: timestamp[s]>>> table.to_pandas()
> >            a                                       b0     [1, 2]
> > {'c': True, 'd': 1991-02-03 00:00:00}1  [3, 4, 5]  {'c': False, 'd':
> > 2019-04-01 00:00:00}
> >
> >
> > Thank You
> >

Re: Join operation on attributes from arrow structs

Reply via email to