Just want to revive this discussion. I was recently asked by a few people
about the possibility to support extension types in Parquet. It would be
good if we can draft a concrete proposal and move forward.

Not related to extension type, there are some requests for union type [1].
I'm not sure if it is a good time to revive it as well.

[1] https://github.com/apache/parquet-format/pull/44

Best,
Gang

On Thu, Jun 6, 2024 at 11:36 PM Jan Finis <[email protected]> wrote:

> Regarding ordering of extension types: The default order of a type is
> already defined to be logical type specific (see `TypeDefinedOrder` in
> parquet.thrift). Therefore, if we make ExtensionType a logical type, then
> by the current semantics of the Parquet spec, they will already be defined
> to come with their own order. The PR that adds ExtensionType should add a
> comment to `TypeDefinedOrder` that for an ExtensionType, the order is
> defined by the type itself.
>
> Cheers,
> Jan
>
> Am Mi., 29. Mai 2024 um 09:10 Uhr schrieb Antoine Pitrou <
> [email protected]
> >:
>
> > On Wed, 29 May 2024 10:27:02 +0800
> > Gang Wu <[email protected]> wrote:
> > > I think adding extension type support will make it easier for adding
> > > tensor or vector type, which is [1] trying to target.
> > >
> > > However, the geometry type seems not easy to fit to the imagination
> > > of the extension type. It would be better to explicitly define
> geospatial
> > > statistics in the spec, otherwise we have to encode them like
> > plain-encoded
> > > min/max values or even use thrift/protobuf to serialize them as binary
> > data.
> >
> > Let's remember here than PLAIN encoding for numeric scalars (such as
> > double or int64) is really a contiguous sequence of native
> > little-endian numbers, just like e.g. the Parquet footer length.
> > There's no need to explicitly invoke the PLAIN decoder, especially when
> > no def/rep levels are involved.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>

Reply via email to