>>>while UTF-8 is great, especially on Windows platforms UTF-16 is more common, 
>>>because the OS uses it heavily internally. Since Win2k it also supports 
>>>surrogates and supplementary characters. So there’s OS support for it. What 
>>>I don’t know is, how universally is UTF-16 (or a subset of it) supported 
>>>across other platforms? Can we assume a certain degree of support on all the 
>>>various platforms that Thrift can run on?

>>>TL;DR: Would it make sense to add UTF-16 as another string format type?

In my opinion, no. This is based on a mistaken understanding or expectation.

Thrift currently supports a string of bytes as a type, and users who wish to 
exchange character string data are expected to impose some kind of meaning on 
top of that. 

What Thrift needs is a genuine string data type, independent of any particular 
transport format, and which fully supports Unicode code points. The transport 
mechanism could be UTF-8, UTF-16, UTF-32 or variable length (zigzag) integers 
(currently Unicode requires about 21 bits).

User libraries would of course be free to reformat those Unicode strings into 
any format comfortably supported by the platform. On Windows UTF-16 is 
preferred, but should never be viewed as something different from the 
underlying Unicode string.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org





Reply via email to