Le 30/07/2022 à 01:02, Wes McKinney a écrit :
I think either path:

* Canonical extension type
* First-class type in the Type union in Flatbuffers

would be OK. The canonical extension type option is the preferable
path here, I think, because it allows Arrow implementations without
any special handling for JSON to allow the data to pass through as
Binary or String. Implementations like C++ could see the extension
type metadata and construct an instance of arrow::Type::JSON /
JsonArray, etc., but when it gets serialized back to Parquet or Arrow
IPC it looks like binary/string (since JSON can be utf-16/utf-32,
right?) with additional field metadata.

It would be reasonable to restrict JSON to utf8, and tell people they need to transcode in the rare cases where some obnoxious software outputs utf16-encoded JSON.

And I agree a canonical extension type would be massively more useful for JSON than for UUID (which basically doesn't make sense: a UUID is an opaque binary string for all practical purposes).

Regards

Antoine.

Reply via email to