[ https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vasiliy Sablin updated THRIFT-1023: ----------------------------------- Attachment: thrift-1023-utf8-encoding-issue.path Hello, here is my fix, as on github. > Thrift encoding (UTF-8) issue with Ruby 1.9.2 > ---------------------------------------------- > > Key: THRIFT-1023 > URL: https://issues.apache.org/jira/browse/THRIFT-1023 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library > Affects Versions: 0.5 > Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0 > Reporter: Vincent Peres > Assignee: Jake Farrell > Attachments: thrift-1023-utf8-encoding-issue.path > > > I came up with an encoding issue coming from the Thrift library, and > especially the BufferedTransport class. > I've decided to write down few tests to give you a concrete example : > # encoding: utf-8 > require 'spec_helper' > describe "encoding" do > before do > transport = > Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090)) > protocol = Thrift::BinaryProtocol.new(transport) > @client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol) > transport.open() > @table_name = "encoding_test" > @column_family = "info:" > end > it "should create a new table" do > column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= > @column_family} > @client.createTable(@table_name, [column]).should be_nil > end > it "should save standard caracteres" do > m = Apache::Hadoop::Hbase::Thrift::Mutation.new > m.column = "info:first_name" > m.value = "Vincent" > m.value.encoding.should == Encoding::UTF_8 > @client.mutateRow(@table_name, "ID1", [m]).should be_nil > end > it "should save UTF8 caracteres" do > m = Apache::Hadoop::Hbase::Thrift::Mutation.new > m.column = "info:first_name" > m.value = "Thorbjørn" > m.value.encoding.should == Encoding::UTF_8 > @client.mutateRow(@table_name, "ID1", [m]).should be_nil > end > it "should destroy the table" do > @client.disableTable(@table_name).should be_nil > @client.deleteTable(@table_name).should be_nil > end > end > It fails when it tries to save the UTF8 string including the caractere 'ø'. > Here is the output : > 1) encoding should save UTF8 caracteres > Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil > incompatible character encodings: ASCII-8BIT and UTF-8 > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in > `write' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in > `write_string' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in > `write' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in > `send_message' > # ./lib/thrift/hbase.rb:289:in `send_mutateRow' > # ./lib/thrift/hbase.rb:284:in `mutateRow' > # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top > (required)>' > Let me know if you need any other details, thank you ! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira