[ 
https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478541#comment-13478541
 ] 

Nathan Beyer commented on THRIFT-1727:
--------------------------------------

[~xb] In reviewing the patch and comments in depth, I think there's some 
confusion on the purpose of the 'convert' methods. These methods should only be 
used in the context of Thrift 'string' fields. They should not be used for 
Thrift 'binary' fields. The 'convert_to_utf8_buffer' has a 'string' parameter 
that is supposed to be a Ruby String of characters, not a Ruby String of bytes. 
The premise is that 'convert_to_utf8_buffer' is supporting Thrift 'string' 
fields, which on the wire are a sequence of UTF-8 bytes. Since a Ruby String 
can a variety of encodings, it must be transcoded and then forced into a BINARY 
encoding to act as a byte buffer.

Are there places where 'convert_to_utf8_buffer' is used for things other than 
Thrift 'string' fields?
                
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
>                 Key: THRIFT-1727
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1727
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a 
> string whose encoding is "ASCII-8BIT") and then serializing this object, the 
> binary data is re-encoded. That is, it is encoded as if it were not a 
> sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 
> encoding. This assumed ISO-8859-1 sequence of characters is then converted 
> into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that 
> all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are 
> converted into multi-byte sequences. This leads to data corruption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to