[
https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480601#comment-13480601
]
XB commented on THRIFT-1727:
----------------------------
Here is a simple spec patch:
{noformat}
--- a/lib/rb/spec/binary_protocol_spec.rb
+++ b/lib/rb/spec/binary_protocol_spec.rb
@@ -56,6 +56,11 @@ describe 'BinaryProtocol' do
e.type == Thrift::ProtocolException::BAD_VERSION
end
end
+
+ it "should keep binary strings as is" do
+ SpecNamespace::Foo2.new(:my_binary =>
"\x01\x23\x45\x67\x89\xAB\xCD\xEF").write(@prot)
+ @trans.available.should == 16
+ end
end
describe Thrift::BinaryProtocolFactory do
{noformat}
I expected it to fail differently than it actually does, but it still fails. (I
wanted to test this under jruby-1.6.8 where I got the original bug, but it
turns out running "rake spec" crashes the JVM. So I needed to run it under MRI
1.9.3, and there it fails with
{noformat}
1) BinaryProtocol Thrift::BinaryProtocol should keep binary strings as is
Failure/Error: SpecNamespace::Foo2.new(:my_binary =>
"\x01\x23\x45\x67\x89\xAB\xCD\xEF").write(@prot)
Encoding::UndefinedConversionError:
"\x89" from ASCII-8BIT to UTF-8
# ./lib/thrift/bytes.rb:84:in `encode'
# ./lib/thrift/bytes.rb:84:in `convert_to_byte_buffer'
# ./lib/thrift/protocol/binary_protocol.rb:110:in `write_string'
# ./spec/binary_protocol_spec.rb:61:in `write'
# ./spec/binary_protocol_spec.rb:61:in `block (3 levels) in <top
(required)>'
{noformat}
)
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
> Key: THRIFT-1727
> URL: https://issues.apache.org/jira/browse/THRIFT-1727
> Project: Thrift
> Issue Type: Bug
> Components: Ruby - Library
> Affects Versions: 0.9
> Environment: JRuby 1.6.8 using "--1.9" command line parameter.
> Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a
> string whose encoding is "ASCII-8BIT") and then serializing this object, the
> binary data is re-encoded. That is, it is encoded as if it were not a
> sequence of bytes but a sequence of characters, encoded using the ISO-8859-1
> encoding. This assumed ISO-8859-1 sequence of characters is then converted
> into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that
> all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are
> converted into multi-byte sequences. This leads to data corruption.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira