Thrift encoding (UTF-8) issue with Ruby 1.9.2 ----------------------------------------------
Key: THRIFT-1023 URL: https://issues.apache.org/jira/browse/THRIFT-1023 Project: Thrift Issue Type: Bug Components: Ruby - Library Affects Versions: 0.5 Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0 Reporter: Vincent Peres I came up with an encoding issue coming from the Thrift library, and especially the BufferedTransport class. I've decided to write down few tests to give you a concrete example : # encoding: utf-8 require 'spec_helper' describe "encoding" do before do transport = Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090)) protocol = Thrift::BinaryProtocol.new(transport) @client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol) transport.open() @table_name = "encoding_test" @column_family = "info:" end it "should create a new table" do column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= @column_family} @client.createTable(@table_name, [column]).should be_nil end it "should save standard caracteres" do m = Apache::Hadoop::Hbase::Thrift::Mutation.new m.column = "info:first_name" m.value = "Vincent" m.value.encoding.should == Encoding::UTF_8 @client.mutateRow(@table_name, "ID1", [m]).should be_nil end it "should save UTF8 caracteres" do m = Apache::Hadoop::Hbase::Thrift::Mutation.new m.column = "info:first_name" m.value = "Thorbjørn" m.value.encoding.should == Encoding::UTF_8 @client.mutateRow(@table_name, "ID1", [m]).should be_nil end it "should destroy the table" do @client.disableTable(@table_name).should be_nil @client.deleteTable(@table_name).should be_nil end end It fails when it tries to save the UTF8 string including the caractere 'ø'. Here is the output : 1) encoding should save UTF8 caracteres Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil incompatible character encodings: ASCII-8BIT and UTF-8 #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in `write' #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in `write_string' #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in `write' #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in `send_message' # ./lib/thrift/hbase.rb:289:in `send_mutateRow' # ./lib/thrift/hbase.rb:284:in `mutateRow' # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top (required)>' Let me know if you need any other details, thank you ! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.