Re: Python is there support for extension types in Parquet?

2020-04-24 Thread Bryan Cutler
Thanks for the tips Micah and Wes. The storage type is an int64 list, which
works in a roundtrip for parquet by itself. I'll look into it a bit more to
see what is going on.

On Fri, Apr 24, 2020 at 11:50 AM Wes McKinney  wrote:

> Extension types will round trip correctly through Parquet so long as
> the storage type can be roundtripped (as Micah pointed out support for
> reading all nested types is not yet available).
>
> Note for reinforcement that Feather V2 is exactly an Arrow IPC file --
> so IPC files could already do this prior to 0.17.0. People seem to
> like the name so I figured there wasn't much reason to discard the
> "brand" which already has a good reputation in the community.
>
> On Fri, Apr 24, 2020 at 1:26 PM Micah Kornfield 
> wrote:
> >
> > Hi Bryan,
> > Extension types isn't explicitly called out but
> > https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks)
> > might be a good place to track this.
> >
> > Thanks,
> > Micah
> >
> > On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler  wrote:
> >
> > > I've been trying out IO with Arrow's extension types and I was able
> write a
> > > parquet file but reading it back causes an error:
> > > "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at
> the
> > > code for the parquet reader, it checks nested types and only allows a
> few
> > > specific ones. Is this a known limitation? I couldn't find a JIRA but
> I'll
> > > make one if it is. Alternatively, I was able to convert my extension
> array
> > > to/from a Pandas DataFrame and read/write to a Feather file, which is
> > > awesome - nice work!
> > >
> > > Thanks,
> > > Bryan
> > >
>


Re: Python is there support for extension types in Parquet?

2020-04-24 Thread Wes McKinney
Extension types will round trip correctly through Parquet so long as
the storage type can be roundtripped (as Micah pointed out support for
reading all nested types is not yet available).

Note for reinforcement that Feather V2 is exactly an Arrow IPC file --
so IPC files could already do this prior to 0.17.0. People seem to
like the name so I figured there wasn't much reason to discard the
"brand" which already has a good reputation in the community.

On Fri, Apr 24, 2020 at 1:26 PM Micah Kornfield  wrote:
>
> Hi Bryan,
> Extension types isn't explicitly called out but
> https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks)
> might be a good place to track this.
>
> Thanks,
> Micah
>
> On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler  wrote:
>
> > I've been trying out IO with Arrow's extension types and I was able write a
> > parquet file but reading it back causes an error:
> > "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at the
> > code for the parquet reader, it checks nested types and only allows a few
> > specific ones. Is this a known limitation? I couldn't find a JIRA but I'll
> > make one if it is. Alternatively, I was able to convert my extension array
> > to/from a Pandas DataFrame and read/write to a Feather file, which is
> > awesome - nice work!
> >
> > Thanks,
> > Bryan
> >


Re: Python is there support for extension types in Parquet?

2020-04-24 Thread Micah Kornfield
Hi Bryan,
Extension types isn't explicitly called out but
https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks)
might be a good place to track this.

Thanks,
Micah

On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler  wrote:

> I've been trying out IO with Arrow's extension types and I was able write a
> parquet file but reading it back causes an error:
> "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at the
> code for the parquet reader, it checks nested types and only allows a few
> specific ones. Is this a known limitation? I couldn't find a JIRA but I'll
> make one if it is. Alternatively, I was able to convert my extension array
> to/from a Pandas DataFrame and read/write to a Feather file, which is
> awesome - nice work!
>
> Thanks,
> Bryan
>