[ 
https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477537#comment-13477537
 ] 

Nathan Beyer commented on THRIFT-1727:
--------------------------------------

[~xb] Can you provide some tests cases to demonstrate the behavior you see an 
incorrect? Where are you seeing the assumed ISO-8859-1 encoding? Within the 
classes, the known encodings should be BINARY (ASCII-8BIT) and UTF-8. Any other 
encoding should only come from externally passed strings.

I asked a bunch of questions on THRIFT-1023 to try to ascertain what APIs were 
working with byte buffers and what APIs were working with strings, but didn't 
get much feedback. Before going any farther, I think we need to clearly 
document the APIs to delineate what is what. Your description and comments 
don't align with my reading of the code. If you can provide more explicit 
details, I would appreciate it.
                
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
>                 Key: THRIFT-1727
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1727
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a 
> string whose encoding is "ASCII-8BIT") and then serializing this object, the 
> binary data is re-encoded. That is, it is encoded as if it were not a 
> sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 
> encoding. This assumed ISO-8859-1 sequence of characters is then converted 
> into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that 
> all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are 
> converted into multi-byte sequences. This leads to data corruption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to