[ 
https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495645#comment-13495645
 ] 

Nathan Beyer commented on THRIFT-1727:
--------------------------------------

I believe the core issue is that there is no 'binary' type. According to the 
Thrift Types (http://thrift.apache.org/docs/types/) document, there is only a 
'string' base type and a 'binary' special type that is a specialized form of 
'string'. 

I'm not sure how this manifests on other languages, but in Ruby, when an IDL 
has a 'binary' type, will add some metadata to the field definitions. Here's an 
example -
{code}
# IDL with a struct that has string and binary types
struct Combo {
  1: string sdata
  2: binary bdata
}

# Generated Ruby code
    class Combo
      include ::Thrift::Struct, ::Thrift::Struct_Union
      SDATA = 1
      BDATA = 2

      FIELDS = {
        SDATA => {:type => ::Thrift::Types::STRING, :name => 'sdata'},
        BDATA => {:type => ::Thrift::Types::STRING, :name => 'bdata', :binary 
=> true}
      }

      def struct_fields; FIELDS; end

      def validate
      end

      ::Thrift::Struct.generate_accessors self
    end
{code}

Unfortunately, this field information is not available in the protocol classes 
when serializing and deserializing. Since 'binary' is not a base type, there is 
no 'write_binary' or 'read_binary'. As such, all that's invoked is 
'write_string' or 'read_string' and these methods don't seem to have enough 
context to get that field definition data. Please let me know if there is 
access to this information, as it could be used to avoid transcoding the data 
and forcing the encoding to BINARY.

How are the other libraries dealing with this special 'binary' type?
                
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
>                 Key: THRIFT-1727
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1727
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a 
> string whose encoding is "ASCII-8BIT") and then serializing this object, the 
> binary data is re-encoded. That is, it is encoded as if it were not a 
> sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 
> encoding. This assumed ISO-8859-1 sequence of characters is then converted 
> into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that 
> all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are 
> converted into multi-byte sequences. This leads to data corruption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to