hi folks,

In two mailing list threads [1] [2] we have discussed adding an
"extension type" mechanism to the Arrow binary/IPC protocol. The idea
is to be able to "annotate" built-in Arrow data types with a type name
and serialized type data/metadata so that users can implement their
own custom columnar data containers that contain application-defined
business logic not built-in to the Arrow libraries. This is designed
to be non-obtrusive: readers who are not aware of an extension type
can interact with the built-in Arrow type opaquely, and propagate the
extension metadata unmodified

As two examples:

* "uuid" may annotate "fixed size binary of value width 16 bytes"
* "latitude-longitude" may annotate "struct<lat: double, lon: double>"
or similar

An implementation may provide specialized columnar containers with
additional business logic around manipulating such data in-memory as
required for application development

We also have prototype implementations of this mechanism ready to go
in C++ and Java. I have proposed language additions to the
specification [3] and the C++ implementation with the following
tenets:

- The custom_metadata Flatbuffers field shall use the colon character
":" as a namespace separator
- "ARROW" is designated as a reserved namespace in custom_metadata,
for example "ARROW:property"
- There may be multiple levels of namespacing, for example:
"ARROW:myorg:property_name"
- Extension type fields "ARROW:extension:name" and
"ARROW:extension:metadata" are reserved in custom_metadata to enable
serialization of extension type information
- The details of implementation and how extension types are exposed to
library users is implementation dependent

Please vote to accept these changes (see [3] for the actual changes).
The vote will be open for at least 72 hours

[ ] +1: Adopt these changes into the Arrow columnar format specification
[ ] +0: . . .
[ ] -1: I disagree because . . .

Here is my vote: +1

[1]: 
https://lists.apache.org/thread.html/96c3f5fe64f45a4c5ccac0562dbfd356b76cd722aa521100b5988d40@%3Cdev.arrow.apache.org%3E
[2]: 
https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E
[3]: https://github.com/apache/arrow/pull/4332

Reply via email to