[DISCUSS] Approach to generic schema representation

Jeremy Leibs Mon, 08 Jul 2024 08:12:22 -0700

I'm looking for any advice folks may have on a generic way to document and
represent expected arrow schemas as part of an interface definition.


For context, our library provides a cross-language (python, c++, rust) SDK
for logging semantic multi-modal data (point clouds, images, geometric
transforms, bounding boxes, etc.). Each of these primitive types has an
associated arrow schema, but to date we have largely abstracted that from
our users through language-native object types, and a bunch of generated
code to "serialize" stuff into the arrow buffers before transmitting via
our IPC.

We're trying to take steps in the direction of making it easier for
advanced users to write and read data from the store directly using arrow,
without needing to go in-and-out of an intermediate object-oriented
representation. However, doing this means documenting to users, for
example: "This is the arrow schema to use when sending a point cloud with a
color channel".

I would love it if, eventually, the arrow project had a way of defining a
spec file similar to a .proto or a .fbs, with all libraries supporting
loading of a schema object by directly parsing the spec. Has anyone taken
steps in this direction?

The best alternative I have at the moment is to redundantly define the
schema for each of the 3 languages implicitly by directly providing the
code to construct a datatype instance with the correct schema. But this
feels unfortunately messy and hard to maintain.

Thanks,
Jeremy

[DISCUSS] Approach to generic schema representation

Reply via email to