[ https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495772#comment-13495772 ]
XB edited comment on THRIFT-1727 at 11/13/12 12:11 AM: ------------------------------------------------------- You are right, the read-path also needs support as the write-path does. However, this looks to be not particularly complicated (see https://issues.apache.org/jira/browse/THRIFT-1726 ): {noformat} diff --git a/lib/rb/lib/thrift/struct_union.rb b/lib/rb/lib/thrift/struct_union.rb index 4e0afcf..7df859c 100644 --- a/lib/rb/lib/thrift/struct_union.rb +++ b/lib/rb/lib/thrift/struct_union.rb @@ -100,6 +100,12 @@ module Thrift end end iprot.read_set_end + when Types::STRING + if field[:binary] + value = Bytes.force_binary_encoding(iprot.read_type(field[:type])) + else + value = iprot.read_type(field[:type]) + end else value = iprot.read_type(field[:type]) end {noformat} was (Author: xb): You are right, the read-path also needs support as the write path does. However, this looks to be not particularly complicated (see https://issues.apache.org/jira/browse/THRIFT-1726 ): {noformat} diff --git a/lib/rb/lib/thrift/struct_union.rb b/lib/rb/lib/thrift/struct_union.rb index 4e0afcf..7df859c 100644 --- a/lib/rb/lib/thrift/struct_union.rb +++ b/lib/rb/lib/thrift/struct_union.rb @@ -100,6 +100,12 @@ module Thrift end end iprot.read_set_end + when Types::STRING + if field[:binary] + value = Bytes.force_binary_encoding(iprot.read_type(field[:type])) + else + value = iprot.read_type(field[:type]) + end else value = iprot.read_type(field[:type]) end {noformat} > Ruby-1.9: data loss: "binary" fields are re-encoded > --------------------------------------------------- > > Key: THRIFT-1727 > URL: https://issues.apache.org/jira/browse/THRIFT-1727 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library > Affects Versions: 0.9 > Environment: JRuby 1.6.8 using "--1.9" command line parameter. > Reporter: XB > > When setting a binary field of a Thrift object with some binary data (e.g. a > string whose encoding is "ASCII-8BIT") and then serializing this object, the > binary data is re-encoded. That is, it is encoded as if it were not a > sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 > encoding. This assumed ISO-8859-1 sequence of characters is then converted > into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that > all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are > converted into multi-byte sequences. This leads to data corruption. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira