Re: Python is there support for extension types in Parquet?
Thanks for the tips Micah and Wes. The storage type is an int64 list, which works in a roundtrip for parquet by itself. I'll look into it a bit more to see what is going on. On Fri, Apr 24, 2020 at 11:50 AM Wes McKinney wrote: > Extension types will round trip correctly through Parquet so long as > the storage type can be roundtripped (as Micah pointed out support for > reading all nested types is not yet available). > > Note for reinforcement that Feather V2 is exactly an Arrow IPC file -- > so IPC files could already do this prior to 0.17.0. People seem to > like the name so I figured there wasn't much reason to discard the > "brand" which already has a good reputation in the community. > > On Fri, Apr 24, 2020 at 1:26 PM Micah Kornfield > wrote: > > > > Hi Bryan, > > Extension types isn't explicitly called out but > > https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks) > > might be a good place to track this. > > > > Thanks, > > Micah > > > > On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler wrote: > > > > > I've been trying out IO with Arrow's extension types and I was able > write a > > > parquet file but reading it back causes an error: > > > "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at > the > > > code for the parquet reader, it checks nested types and only allows a > few > > > specific ones. Is this a known limitation? I couldn't find a JIRA but > I'll > > > make one if it is. Alternatively, I was able to convert my extension > array > > > to/from a Pandas DataFrame and read/write to a Feather file, which is > > > awesome - nice work! > > > > > > Thanks, > > > Bryan > > > >
Re: Python is there support for extension types in Parquet?
Extension types will round trip correctly through Parquet so long as the storage type can be roundtripped (as Micah pointed out support for reading all nested types is not yet available). Note for reinforcement that Feather V2 is exactly an Arrow IPC file -- so IPC files could already do this prior to 0.17.0. People seem to like the name so I figured there wasn't much reason to discard the "brand" which already has a good reputation in the community. On Fri, Apr 24, 2020 at 1:26 PM Micah Kornfield wrote: > > Hi Bryan, > Extension types isn't explicitly called out but > https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks) > might be a good place to track this. > > Thanks, > Micah > > On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler wrote: > > > I've been trying out IO with Arrow's extension types and I was able write a > > parquet file but reading it back causes an error: > > "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at the > > code for the parquet reader, it checks nested types and only allows a few > > specific ones. Is this a known limitation? I couldn't find a JIRA but I'll > > make one if it is. Alternatively, I was able to convert my extension array > > to/from a Pandas DataFrame and read/write to a Feather file, which is > > awesome - nice work! > > > > Thanks, > > Bryan > >
Re: Python is there support for extension types in Parquet?
Hi Bryan, Extension types isn't explicitly called out but https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks) might be a good place to track this. Thanks, Micah On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler wrote: > I've been trying out IO with Arrow's extension types and I was able write a > parquet file but reading it back causes an error: > "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at the > code for the parquet reader, it checks nested types and only allows a few > specific ones. Is this a known limitation? I couldn't find a JIRA but I'll > make one if it is. Alternatively, I was able to convert my extension array > to/from a Pandas DataFrame and read/write to a Feather file, which is > awesome - nice work! > > Thanks, > Bryan >