[
https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495645#comment-13495645
]
Nathan Beyer commented on THRIFT-1727:
--------------------------------------
I believe the core issue is that there is no 'binary' type. According to the
Thrift Types (http://thrift.apache.org/docs/types/) document, there is only a
'string' base type and a 'binary' special type that is a specialized form of
'string'.
I'm not sure how this manifests on other languages, but in Ruby, when an IDL
has a 'binary' type, will add some metadata to the field definitions. Here's an
example -
{code}
# IDL with a struct that has string and binary types
struct Combo {
1: string sdata
2: binary bdata
}
# Generated Ruby code
class Combo
include ::Thrift::Struct, ::Thrift::Struct_Union
SDATA = 1
BDATA = 2
FIELDS = {
SDATA => {:type => ::Thrift::Types::STRING, :name => 'sdata'},
BDATA => {:type => ::Thrift::Types::STRING, :name => 'bdata', :binary
=> true}
}
def struct_fields; FIELDS; end
def validate
end
::Thrift::Struct.generate_accessors self
end
{code}
Unfortunately, this field information is not available in the protocol classes
when serializing and deserializing. Since 'binary' is not a base type, there is
no 'write_binary' or 'read_binary'. As such, all that's invoked is
'write_string' or 'read_string' and these methods don't seem to have enough
context to get that field definition data. Please let me know if there is
access to this information, as it could be used to avoid transcoding the data
and forcing the encoding to BINARY.
How are the other libraries dealing with this special 'binary' type?
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
> Key: THRIFT-1727
> URL: https://issues.apache.org/jira/browse/THRIFT-1727
> Project: Thrift
> Issue Type: Bug
> Components: Ruby - Library
> Affects Versions: 0.9
> Environment: JRuby 1.6.8 using "--1.9" command line parameter.
> Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a
> string whose encoding is "ASCII-8BIT") and then serializing this object, the
> binary data is re-encoded. That is, it is encoded as if it were not a
> sequence of bytes but a sequence of characters, encoded using the ISO-8859-1
> encoding. This assumed ISO-8859-1 sequence of characters is then converted
> into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that
> all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are
> converted into multi-byte sequences. This leads to data corruption.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira