Just want to revive this discussion. I was recently asked by a few people about the possibility to support extension types in Parquet. It would be good if we can draft a concrete proposal and move forward.
Not related to extension type, there are some requests for union type [1]. I'm not sure if it is a good time to revive it as well. [1] https://github.com/apache/parquet-format/pull/44 Best, Gang On Thu, Jun 6, 2024 at 11:36 PM Jan Finis <[email protected]> wrote: > Regarding ordering of extension types: The default order of a type is > already defined to be logical type specific (see `TypeDefinedOrder` in > parquet.thrift). Therefore, if we make ExtensionType a logical type, then > by the current semantics of the Parquet spec, they will already be defined > to come with their own order. The PR that adds ExtensionType should add a > comment to `TypeDefinedOrder` that for an ExtensionType, the order is > defined by the type itself. > > Cheers, > Jan > > Am Mi., 29. Mai 2024 um 09:10 Uhr schrieb Antoine Pitrou < > [email protected] > >: > > > On Wed, 29 May 2024 10:27:02 +0800 > > Gang Wu <[email protected]> wrote: > > > I think adding extension type support will make it easier for adding > > > tensor or vector type, which is [1] trying to target. > > > > > > However, the geometry type seems not easy to fit to the imagination > > > of the extension type. It would be better to explicitly define > geospatial > > > statistics in the spec, otherwise we have to encode them like > > plain-encoded > > > min/max values or even use thrift/protobuf to serialize them as binary > > data. > > > > Let's remember here than PLAIN encoding for numeric scalars (such as > > double or int64) is really a contiguous sequence of native > > little-endian numbers, just like e.g. the Parquet footer length. > > There's no need to explicitly invoke the PLAIN decoder, especially when > > no def/rep levels are involved. > > > > Regards > > > > Antoine. > > > > > > >
