Le 30/07/2022 à 01:02, Wes McKinney a écrit :
I think either path: * Canonical extension type * First-class type in the Type union in Flatbuffers would be OK. The canonical extension type option is the preferable path here, I think, because it allows Arrow implementations without any special handling for JSON to allow the data to pass through as Binary or String. Implementations like C++ could see the extension type metadata and construct an instance of arrow::Type::JSON / JsonArray, etc., but when it gets serialized back to Parquet or Arrow IPC it looks like binary/string (since JSON can be utf-16/utf-32, right?) with additional field metadata.
It would be reasonable to restrict JSON to utf8, and tell people they need to transcode in the rare cases where some obnoxious software outputs utf16-encoded JSON.
And I agree a canonical extension type would be massively more useful for JSON than for UUID (which basically doesn't make sense: a UUID is an opaque binary string for all practical purposes).
Regards Antoine.