I agree that the bar for adding new types to the Type union in Schema.fbs
should be quite high going forward. Using extension types increasingly for
adding specializations of built-in types will mean less burden for
implementations to simply "propagate forward" this data (by preserving the
extra metadata) even if they don't understand what it does. It would be
nice, therefore, to put us on a path to expanding our set of "official"
extension types (which would include things like JSON or UUID) since some
libraries may choose to implement convenience containers for these for
usability.

On Fri, Apr 30, 2021 at 9:22 AM Brian Hulette <bhule...@apache.org> wrote:

> +1 this looks good to me.
>
> My only concern is with criteria #3 " Is the underlying encoding of the
> type already semantically supported by a type?". I think this is a good
> criteria, but it's inconsistent with the current spec. By that criteria
> some existing types (Timestamp, Time, Duration, Date) should be well known
> extension types, right?
>
> Perhaps we should explicitly indicate these types are grandfathered in [1]
> because they existed before extension types, to avoid tension with this
> criteria.
>
> Brian
>
> [1] https://en.wikipedia.org/wiki/Grandfather_clause
>
> On Thu, Apr 29, 2021 at 9:13 PM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Thanks for writing this.
> >
> > I agree. That is a good decision tree. +1
> >
> > Best,
> > Jorge
> >
> >
> > On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > The discussion around adding another interval type to the Schema.fbs
> > raises
> > > the issue of when do we decide to add a new type to the Schema.fbs vs
> > using
> > > other means (primarily extension types [1]).
> > >
> > > A few criteria come to mind that could help decide (feedback welcome):
> > >
> > > 1.  Is the type a new parameterization of an existing type?
> > >     - If Yes, and we believe the parameterization is useful and can be
> > done
> > > in a forward/backward compatible manner then we would update
> Schema.fbs.
> > >
> > > 2.  Does the type itself have its own specification for processing
> (e.g.
> > > JSON, BSON, Thrift, Avro, Protobuf)?
> > >   - If yes, we would NOT add them to Schema.fbs.  I think this would
> > > potentially yield too many new types.
> > >
> > > 3.  Is the underlying encoding of the type already semantically
> supported
> > > by a type? (e.g. if we want to encode physical lengths like meters
> these
> > > can be represented by an integer).
> > >    - If yes, we would NOT update the specification.  This seems like
> the
> > > exact use-case that extension types are meant for.
> > >
> > > * How does this apply to Interval? *
> > > Interval extends an existing type in the specification and multiple
> > "packed
> > > fields" cannot be easily communicated with the current version of the
> > > specification.  Hence, I feel comfortable making the addition to
> > Schema.fbs
> > >
> > > * What does this mean for other common types? *
> > >
> > > I think as types come up that are very common but we don't want to add
> to
> > > the Schema.fbs we should invest in formalizing them as "Well Known"
> > > Extension types.  In this scenario, we would update the specification
> to
> > > include how to specify the extension type metadata (and still require
> at
> > > least two libraries support the Extension type before inclusion as
> "Well
> > > Known").
> > >
> > > * Practical implications *
> > >
> > > I think this means the type system in Schema.fbs is mostly closed (i.e.
> > > there is a high bar for adding new types). One potentially useful type
> to
> > > have would be a "packed struct" that supports something similar to
> python
> > > struct library [2].  I think this would likely cover many extension
> type
> > > use-cases.
> > >
> > > Thoughts?
> > >
> > > -Micah
> > >
> > > [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types
> > > [2] https://docs.python.org/3/library/struct.html
> > >
> >
>

Reply via email to