I don't like the idea of adding a new utf-16 string type to the wire
protocol, but I think it would be fine to add a utf-16 string type to the
language bindings.  UTF-8 would be sent over the wire, and then converted
from the network buffer into the user's desired string type.  A lot of the
cost and inconvenience of utf-8 and utf-16 is just dealing with all the
conversions, and Thrift seems like a reasonable place to remove one of
those conversions.

On Fri, Jan 1, 2016 at 4:01 AM, Jens Geyer <jensge...@hotmail.com> wrote:

> Yes, that was the question. It could eliminate some conversions from and
> to utf8 (speed is a Thrift goal) but I'm not sure if the possible gains are
> worth doing it.
>
> Re keeping it simple: I fully agree, absolutely. But we have 4 integer
> types and there are thoughts to integrate floats as well ...
>
> Happy new year!
> ________________________________
> Von: Randy Abernethy
> Gesendet: 01.01.2016 02:56
> An: dev@thrift.apache.org
> Betreff: Re: UTF-16
>
> Hey David,
>
> Apache Thrift has a "string" type in its IDL and that type is a language
> native string in the generated code but is UTF-8 on the wire when using
> binary, compact or JSON protocols by default.
>
> I think Jens is posing the question (correct me if I'm wrong Jens): Should
> we also support UTF-16 string encoding on the wire with binary, compact and
> JSON protocols.
>
> -Randy
>
> On Thu, Dec 31, 2015 at 5:09 PM, David Bennett <da...@yorkage.com> wrote:
>
> > >>>while UTF-8 is great, especially on Windows platforms UTF-16 is more
> > common, because the OS uses it heavily internally. Since Win2k it also
> > supports surrogates and supplementary characters. So there’s OS support
> for
> > it. What I don’t know is, how universally is UTF-16 (or a subset of it)
> > supported across other platforms? Can we assume a certain degree of
> support
> > on all the various platforms that Thrift can run on?
> >
> > >>>TL;DR: Would it make sense to add UTF-16 as another string format
> type?
> >
> > In my opinion, no. This is based on a mistaken understanding or
> > expectation.
> >
> > Thrift currently supports a string of bytes as a type, and users who wish
> > to exchange character string data are expected to impose some kind of
> > meaning on top of that.
> >
> > What Thrift needs is a genuine string data type, independent of any
> > particular transport format, and which fully supports Unicode code
> points.
> > The transport mechanism could be UTF-8, UTF-16, UTF-32 or variable length
> > (zigzag) integers (currently Unicode requires about 21 bits).
> >
> > User libraries would of course be free to reformat those Unicode strings
> > into any format comfortably supported by the platform. On Windows UTF-16
> is
> > preferred, but should never be viewed as something different from the
> > underlying Unicode string.
> >
> > Regards
> > David M Bennett FACS
> >
> > Andl - A New Database Language - andl.org
> >
> >
> >
> >
> >
> >
>

Reply via email to