I don't like the idea of adding a new utf-16 string type to the wire protocol, but I think it would be fine to add a utf-16 string type to the language bindings. UTF-8 would be sent over the wire, and then converted from the network buffer into the user's desired string type. A lot of the cost and inconvenience of utf-8 and utf-16 is just dealing with all the conversions, and Thrift seems like a reasonable place to remove one of those conversions.
On Fri, Jan 1, 2016 at 4:01 AM, Jens Geyer <jensge...@hotmail.com> wrote: > Yes, that was the question. It could eliminate some conversions from and > to utf8 (speed is a Thrift goal) but I'm not sure if the possible gains are > worth doing it. > > Re keeping it simple: I fully agree, absolutely. But we have 4 integer > types and there are thoughts to integrate floats as well ... > > Happy new year! > ________________________________ > Von: Randy Abernethy > Gesendet: 01.01.2016 02:56 > An: dev@thrift.apache.org > Betreff: Re: UTF-16 > > Hey David, > > Apache Thrift has a "string" type in its IDL and that type is a language > native string in the generated code but is UTF-8 on the wire when using > binary, compact or JSON protocols by default. > > I think Jens is posing the question (correct me if I'm wrong Jens): Should > we also support UTF-16 string encoding on the wire with binary, compact and > JSON protocols. > > -Randy > > On Thu, Dec 31, 2015 at 5:09 PM, David Bennett <da...@yorkage.com> wrote: > > > >>>while UTF-8 is great, especially on Windows platforms UTF-16 is more > > common, because the OS uses it heavily internally. Since Win2k it also > > supports surrogates and supplementary characters. So there’s OS support > for > > it. What I don’t know is, how universally is UTF-16 (or a subset of it) > > supported across other platforms? Can we assume a certain degree of > support > > on all the various platforms that Thrift can run on? > > > > >>>TL;DR: Would it make sense to add UTF-16 as another string format > type? > > > > In my opinion, no. This is based on a mistaken understanding or > > expectation. > > > > Thrift currently supports a string of bytes as a type, and users who wish > > to exchange character string data are expected to impose some kind of > > meaning on top of that. > > > > What Thrift needs is a genuine string data type, independent of any > > particular transport format, and which fully supports Unicode code > points. > > The transport mechanism could be UTF-8, UTF-16, UTF-32 or variable length > > (zigzag) integers (currently Unicode requires about 21 bits). > > > > User libraries would of course be free to reformat those Unicode strings > > into any format comfortably supported by the platform. On Windows UTF-16 > is > > preferred, but should never be viewed as something different from the > > underlying Unicode string. > > > > Regards > > David M Bennett FACS > > > > Andl - A New Database Language - andl.org > > > > > > > > > > > > >