Thrift encoding  (UTF-8) issue with Ruby 1.9.2
----------------------------------------------

                 Key: THRIFT-1023
                 URL: https://issues.apache.org/jira/browse/THRIFT-1023
             Project: Thrift
          Issue Type: Bug
          Components: Ruby - Library
    Affects Versions: 0.5
         Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0
            Reporter: Vincent Peres


I came up with an encoding issue coming from the Thrift library, and especially 
the BufferedTransport class.
I've decided to write down few tests to give you a concrete example :

# encoding: utf-8
require 'spec_helper'

describe "encoding" do

 before do
   transport = 
Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
   protocol  = Thrift::BinaryProtocol.new(transport)
   @client   = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)

   transport.open()

   @table_name = "encoding_test"
   @column_family = "info:"
 end

 it "should create a new table" do
   column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= 
@column_family}
   @client.createTable(@table_name, [column]).should be_nil
 end

 it "should save standard caracteres" do
   m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
   m.column = "info:first_name"
   m.value  = "Vincent"

   m.value.encoding.should == Encoding::UTF_8
   @client.mutateRow(@table_name, "ID1", [m]).should be_nil
 end

 it "should save UTF8 caracteres" do
   m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
   m.column = "info:first_name"
   m.value  = "Thorbjørn"

   m.value.encoding.should == Encoding::UTF_8
   @client.mutateRow(@table_name, "ID1", [m]).should be_nil
 end

 it "should destroy the table" do
   @client.disableTable(@table_name).should be_nil
   @client.deleteTable(@table_name).should be_nil
 end
end

It fails when it tries to save the UTF8 string including the caractere 'ø'.

Here is the output :

 1) encoding should save UTF8 caracteres
    Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
    incompatible character encodings: ASCII-8BIT and UTF-8
    
#/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
`write'
    
#/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
`write_string'
    
#/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
`write'
    
#/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
`send_message'
    # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
    # ./lib/thrift/hbase.rb:284:in `mutateRow'
    # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
(required)>'

Let me know if you need any other details, thank you !

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to