Re: unicode types

Randy Abernethy Wed, 15 Oct 2014 14:50:24 -0700

Thrift is utf-8 everywhere.

The string doc is here: https://thrift.apache.org/docs/types


On Wed, Oct 15, 2014 at 6:53 AM, Peter Neumark <[email protected]> wrote:
> Does thrift officially say anything about the character encoding of string
> fields?
>
> On Tue, Oct 14, 2014 at 9:48 PM, Jens Geyer <[email protected]> wrote:
>
>> Hi Peter,
>>
>>  The thrift wire format has support for unicode fields:
>>>
>>
>> I just scanned the code base.
>> - In some cases the TType numbers go up to 17, including utf8 and 16.
>> - Other languages do only define what is actually used, up to 15
>>
>> I don't know what the intention is/was behind these two additional values.
>> Maybe someone else can chime in here.
>>
>> Have fun,
>> JensG
>>
>>
>>
>> -----Ursprüngliche Nachricht----- From: Peter Neumark
>> Sent: Tuesday, October 14, 2014 9:36 AM
>> To: [email protected]
>> Subject: Re: unicode types
>>
>> I'd prefer not to use "string" types in my thrift files, since that doesn't
>> say anything about the encoding.
>> Instead, I'd like the following:
>>
>> struct JpegData {
>>    1: optional binary exif,
>>    2: *utf8* filename,
>>
>>    3: i32 bytes
>> }
>>
>> The thrift wire format has support for unicode fields:
>> https://github.com/apache/thrift/blob/master/lib/py/src/Thrift.py#L39
>>
>> But the IDL doesn't let me use them directly for some reason.
>> My question is, are there plans to support the example thrift struct
>> definition above?
>>
>> Thanks,
>> Peter
>>
>> On Tue, Oct 14, 2014 at 9:13 AM, Jens Geyer <[email protected]> wrote:
>>
>>  Hi Peter,
>>>
>>> They need to be interoperable between all platforms an lggs. Somewhere in
>>> the docs UTF8 is mentioned, IIRC. Is that what you ask for?
>>>
>>> ________________________________
>>> Von: Peter Neumark
>>> Gesendet: 13.10.2014 22:59
>>> An: [email protected]
>>> Betreff: unicode types
>>>
>>> Hi all,
>>>
>>> Looking at the wire format's type IDs, it's clear that thrift supports
>>> several thrift encodings in it's wire format, yet the IDL does not allow
>>> one to speak of string encoding (string/binary are the only type names in
>>> the IDL).
>>>
>>> Is this a design decision (where each language implementation can choose
>>> the appropriate Unicode type id for encoding strings), or is there some
>>> historical reason for not exposing string encoding options in the thrift
>>> IDL?
>>>
>>> Thanks,
>>> Peter
>>>
>>> --
>>>
>>> *Peter Neumark*
>>> DevOps guy @Prezi <http://prezi.com>
>>>
>>>
>>
>>
>> --
>>
>> *Peter Neumark*
>> DevOps guy @Prezi <http://prezi.com>
>>
>
>
>
> --
>
> *Peter Neumark*
> DevOps guy @Prezi <http://prezi.com>

Re: unicode types

Reply via email to