hi folks, In two mailing list threads [1] [2] we have discussed adding an "extension type" mechanism to the Arrow binary/IPC protocol. The idea is to be able to "annotate" built-in Arrow data types with a type name and serialized type data/metadata so that users can implement their own custom columnar data containers that contain application-defined business logic not built-in to the Arrow libraries. This is designed to be non-obtrusive: readers who are not aware of an extension type can interact with the built-in Arrow type opaquely, and propagate the extension metadata unmodified
As two examples: * "uuid" may annotate "fixed size binary of value width 16 bytes" * "latitude-longitude" may annotate "struct<lat: double, lon: double>" or similar An implementation may provide specialized columnar containers with additional business logic around manipulating such data in-memory as required for application development We also have prototype implementations of this mechanism ready to go in C++ and Java. I have proposed language additions to the specification [3] and the C++ implementation with the following tenets: - The custom_metadata Flatbuffers field shall use the colon character ":" as a namespace separator - "ARROW" is designated as a reserved namespace in custom_metadata, for example "ARROW:property" - There may be multiple levels of namespacing, for example: "ARROW:myorg:property_name" - Extension type fields "ARROW:extension:name" and "ARROW:extension:metadata" are reserved in custom_metadata to enable serialization of extension type information - The details of implementation and how extension types are exposed to library users is implementation dependent Please vote to accept these changes (see [3] for the actual changes). The vote will be open for at least 72 hours [ ] +1: Adopt these changes into the Arrow columnar format specification [ ] +0: . . . [ ] -1: I disagree because . . . Here is my vote: +1 [1]: https://lists.apache.org/thread.html/96c3f5fe64f45a4c5ccac0562dbfd356b76cd722aa521100b5988d40@%3Cdev.arrow.apache.org%3E [2]: https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E [3]: https://github.com/apache/arrow/pull/4332