[
https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422008#comment-13422008
]
Nathan Beyer commented on THRIFT-1023:
--------------------------------------
Now that I have a running Ubuntu 12.04 setup, I did some further testing. With
the latest patch, everything runs fine on Ruby 1.8.7. However, on Ruby 1.9.3,
I'm seeing some segfaults. When I remove the native code, everything seems to
run fine. I suspect the issue is with how the String encoding is manipulated.
Instead of using some of the more direct C methods, I'm going to attempt to
rewrite it by just using the Ruby methods invoked from C.
For those interested, I removed the native code by commenting out the
'build_ext' task in the Rakefile and then running 'bundle exec rake clean' and
'bundle exec rake gem' to test it. You should see an 'unable to load
thrift_native' message in the console.
> Thrift encoding (UTF-8) issue with Ruby 1.9.2
> ----------------------------------------------
>
> Key: THRIFT-1023
> URL: https://issues.apache.org/jira/browse/THRIFT-1023
> Project: Thrift
> Issue Type: Bug
> Components: Ruby - Library
> Affects Versions: 0.5
> Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0
> Reporter: Vincent Peres
> Assignee: Jake Farrell
> Fix For: 0.9
>
> Attachments:
> THRIFT-1023-refactor-transport-protocol-for-ruby19-v2.patch,
> THRIFT-1023-refactor-transport-protocol-for-ruby19.patch,
> thrift-1023-utf8-encoding-issue.path
>
>
> I came up with an encoding issue coming from the Thrift library, and
> especially the BufferedTransport class.
> I've decided to write down few tests to give you a concrete example :
> # encoding: utf-8
> require 'spec_helper'
> describe "encoding" do
> before do
> transport =
> Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
> protocol = Thrift::BinaryProtocol.new(transport)
> @client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
> transport.open()
> @table_name = "encoding_test"
> @column_family = "info:"
> end
> it "should create a new table" do
> column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name=
> @column_family}
> @client.createTable(@table_name, [column]).should be_nil
> end
> it "should save standard caracteres" do
> m = Apache::Hadoop::Hbase::Thrift::Mutation.new
> m.column = "info:first_name"
> m.value = "Vincent"
> m.value.encoding.should == Encoding::UTF_8
> @client.mutateRow(@table_name, "ID1", [m]).should be_nil
> end
> it "should save UTF8 caracteres" do
> m = Apache::Hadoop::Hbase::Thrift::Mutation.new
> m.column = "info:first_name"
> m.value = "Thorbjørn"
> m.value.encoding.should == Encoding::UTF_8
> @client.mutateRow(@table_name, "ID1", [m]).should be_nil
> end
> it "should destroy the table" do
> @client.disableTable(@table_name).should be_nil
> @client.deleteTable(@table_name).should be_nil
> end
> end
> It fails when it tries to save the UTF8 string including the caractere 'ø'.
> Here is the output :
> 1) encoding should save UTF8 caracteres
> Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
> incompatible character encodings: ASCII-8BIT and UTF-8
>
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
> `write'
>
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
> `write_string'
>
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `write'
>
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `send_message'
> # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
> # ./lib/thrift/hbase.rb:284:in `mutateRow'
> # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
> (required)>'
> Let me know if you need any other details, thank you !
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira