Le 30/09/2022 à 18:57, Kevin Bambrick a écrit :
The issue I am facing is sending a UTF-16 string over the wire.
Ok, then you can just transcode the strings before sending them as
String, *or* you can send them as Binary (not String).
Where do these UTF-16 strings come from?
> What would the difference be between adding a new data type and an
> extension type for UTF-16?
An extension type is for the most part a piece of metadata attached to
data represented in an existing data type (such as Binary), and that
consumers can optional recognize in order to better interpret the data.
So if one were to make a UTF-16 extension type based on the Binary data
type, implementations could either recognize it as Binary or as UTF-16,
depending on whether they know about that particular extension type or not.
(in practice, it would make more sense to make a parameterized "encoded
text" extension type, instead of making a specific one for UTF-16)
I recommend reading about the Arrow columnar format and especially this
section about extension types:
https://arrow.apache.org/docs/format/Columnar.html#extension-types
Regards
Antoine.