I concur with both what Wes and Micah said.

As for temporal types, they have wide-spread use and their semantics require dedicated treatment for arithmetic and conversion, so it's helpful to define dedicated types for them, as opposed to mere annotations.

Regards

Antoine.


Le 30/04/2021 à 16:40, Wes McKinney a écrit :
I agree that the bar for adding new types to the Type union in Schema.fbs
should be quite high going forward. Using extension types increasingly for
adding specializations of built-in types will mean less burden for
implementations to simply "propagate forward" this data (by preserving the
extra metadata) even if they don't understand what it does. It would be
nice, therefore, to put us on a path to expanding our set of "official"
extension types (which would include things like JSON or UUID) since some
libraries may choose to implement convenience containers for these for
usability.

On Fri, Apr 30, 2021 at 9:22 AM Brian Hulette <bhule...@apache.org> wrote:

+1 this looks good to me.

My only concern is with criteria #3 " Is the underlying encoding of the
type already semantically supported by a type?". I think this is a good
criteria, but it's inconsistent with the current spec. By that criteria
some existing types (Timestamp, Time, Duration, Date) should be well known
extension types, right?

Perhaps we should explicitly indicate these types are grandfathered in [1]
because they existed before extension types, to avoid tension with this
criteria.

Brian

[1] https://en.wikipedia.org/wiki/Grandfather_clause

On Thu, Apr 29, 2021 at 9:13 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

Thanks for writing this.

I agree. That is a good decision tree. +1

Best,
Jorge


On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

The discussion around adding another interval type to the Schema.fbs
raises
the issue of when do we decide to add a new type to the Schema.fbs vs
using
other means (primarily extension types [1]).

A few criteria come to mind that could help decide (feedback welcome):

1.  Is the type a new parameterization of an existing type?
     - If Yes, and we believe the parameterization is useful and can be
done
in a forward/backward compatible manner then we would update
Schema.fbs.

2.  Does the type itself have its own specification for processing
(e.g.
JSON, BSON, Thrift, Avro, Protobuf)?
   - If yes, we would NOT add them to Schema.fbs.  I think this would
potentially yield too many new types.

3.  Is the underlying encoding of the type already semantically
supported
by a type? (e.g. if we want to encode physical lengths like meters
these
can be represented by an integer).
    - If yes, we would NOT update the specification.  This seems like
the
exact use-case that extension types are meant for.

* How does this apply to Interval? *
Interval extends an existing type in the specification and multiple
"packed
fields" cannot be easily communicated with the current version of the
specification.  Hence, I feel comfortable making the addition to
Schema.fbs

* What does this mean for other common types? *

I think as types come up that are very common but we don't want to add
to
the Schema.fbs we should invest in formalizing them as "Well Known"
Extension types.  In this scenario, we would update the specification
to
include how to specify the extension type metadata (and still require
at
least two libraries support the Extension type before inclusion as
"Well
Known").

* Practical implications *

I think this means the type system in Schema.fbs is mostly closed (i.e.
there is a high bar for adding new types). One potentially useful type
to
have would be a "packed struct" that supports something similar to
python
struct library [2].  I think this would likely cover many extension
type
use-cases.

Thoughts?

-Micah

[1] https://arrow.apache.org/docs/format/Columnar.html#extension-types
[2] https://docs.python.org/3/library/struct.html




Reply via email to