On Fri, Oct 31, 2025 at 1:14 PM Micah Kornfield <[email protected]>
wrote:

> Hi Andrew,
> If this is to support new type (point cloud data), is there a reason to
> choose a key value member to the schema over something like the extension
> type proposal [1]
>

In some ways it's no different -- you're providing some data to ride along
with a column.  The extension type has the advantage of providing an
indirection which *might* be useful for the case when you have many columns
of the same type, though this seems a pretty specific use case and adds
additional complexity. However, extension types provide no hint of meaning
to be found in the "serialization" field (JSON is suggested, which could
provide keys, but would also require an additional parsing step).

Allowing the addition of data to the existing SchemaElement is trivially
simple and more flexible. Users could add whatever data they like to
annotate their schema element without introducing anything to the type
system. For example, one could add a description to an integer element
without creating an "Integer with Description" type or provide language
information about a string without creating a type "String in French".

The extension type proposal suggests that readers will be modified to
support the extension types.  Adding metadata directly to the SchemaElement
simply allows code *outside* of a Parquet reader to use the information for
its own purpose -- a reader only needs to provide an API to access the
metadata to be useful.

Some examples from point cloud data:

- Integers to which a scale and offset are applied to create a nominal
value (the current integer-based scale/offset are insufficient).
- Units for many types.
- GPS times are stored in several ways -- having metadata which may or may
not include an offset allows for proper interpretation.
- Descriptions of bit fields packed into integers.
- Indication that "return" numbers are synthetically generated. (A laser
pulse can create multiple points, each known as a "return").

There's certainly nothing that precludes doing both extension types and
adding metadata support for SchemaElements.

-- 
Andrew Bell
[email protected]

Reply via email to