Le 30/09/2022 à 18:57, Kevin Bambrick a écrit :
The issue I am facing is sending a UTF-16 string over the wire.

Ok, then you can just transcode the strings before sending them as String, *or* you can send them as Binary (not String).

Where do these UTF-16 strings come from?

> What would the difference be between adding a new data type and an
> extension type for UTF-16?

An extension type is for the most part a piece of metadata attached to data represented in an existing data type (such as Binary), and that consumers can optional recognize in order to better interpret the data.

So if one were to make a UTF-16 extension type based on the Binary data type, implementations could either recognize it as Binary or as UTF-16, depending on whether they know about that particular extension type or not.

(in practice, it would make more sense to make a parameterized "encoded text" extension type, instead of making a specific one for UTF-16)

I recommend reading about the Arrow columnar format and especially this section about extension types:
https://arrow.apache.org/docs/format/Columnar.html#extension-types


Regards

Antoine.

Reply via email to