[ 
https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495713#comment-13495713
 ] 

XB commented on THRIFT-1727:
----------------------------

bq. Please let me know if there is access to this information, as it could be 
used to avoid transcoding the data and forcing the encoding to BINARY.

Well, it is easy to know when force the encoding to BINARY: when the encoding 
is not already BINARY. So simply check for
{noformat}
string.encoding != Encoding::BINARY
{noformat}
and only encode then.

This is not precisely the same as enforcing that "binary" fields get only 
BINARY encoded strings and "string" fields get only non-BINARY encoded strings, 
but this is due to the peculiarity of the Thrift specification that there is 
compatibility between these two types of strings at the Thrift specification 
level. But because there is compatibility between these two types of strings at 
the Ruby level, it actually fits quite nicely.
                
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
>                 Key: THRIFT-1727
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1727
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a 
> string whose encoding is "ASCII-8BIT") and then serializing this object, the 
> binary data is re-encoded. That is, it is encoded as if it were not a 
> sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 
> encoding. This assumed ISO-8859-1 sequence of characters is then converted 
> into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that 
> all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are 
> converted into multi-byte sequences. This leads to data corruption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to