[ 
https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456654#comment-13456654
 ] 

Nathan Beyer edited comment on THRIFT-1023 at 10/1/12 6:16 PM:
---------------------------------------------------------------

Attachement: [^THRIFT-1023-refactor-transport-protocol-for-ruby19-v4.patch]

I've made an additional change to my patch. This adds some checks for frozen 
objects. I recently learned that Ruby will copy and freeze Strings when used as 
keys in Hashes. This additional tweak to the patch will check for frozen 
Strings and 'dup' them if they are frozen.

This obviously demonstrates the need for more unit testing, but I'm waiting for 
THRIFT-1644 to get resolved before building out any more unit tests.
                
      was (Author: nbeyer):
    Attachement: THRIFT-1023-refactor-transport-protocol-for-ruby19-v4.patch

I've made an additional change to my patch. This adds some checks for frozen 
objects. I recently learned that Ruby will copy and freeze Strings when used as 
keys in Hashes. This additional tweak to the patch will check for frozen 
Strings and 'dup' them if they are frozen.

This obviously demonstrates the need for more unit testing, but I'm waiting for 
THRIFT-1644 to get resolved before building out any more unit tests.
                  
> Thrift encoding  (UTF-8) issue with Ruby 1.9.2
> ----------------------------------------------
>
>                 Key: THRIFT-1023
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1023
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.5
>         Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0
>            Reporter: Vincent Peres
>            Assignee: Jake Farrell
>             Fix For: 0.9
>
>         Attachments: THRIFT-1023-build-ruby19.patch, 
> THRIFT-1023-refactor-transport-protocol-for-ruby19.patch, 
> THRIFT-1023-refactor-transport-protocol-for-ruby19-v2.patch, 
> THRIFT-1023-refactor-transport-protocol-for-ruby19-v3.patch, 
> THRIFT-1023-refactor-transport-protocol-for-ruby19-v4.patch, 
> THRIFT-1023-refactor-transport-protocol-for-ruby19-v5.patch, 
> THRIFT-1023-refactor-transport-protocol-for-ruby19-v6.patch, 
> thrift-1023-utf8-encoding-issue.path
>
>
> I came up with an encoding issue coming from the Thrift library, and 
> especially the BufferedTransport class.
> I've decided to write down few tests to give you a concrete example :
> # encoding: utf-8
> require 'spec_helper'
> describe "encoding" do
>  before do
>    transport = 
> Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
>    protocol  = Thrift::BinaryProtocol.new(transport)
>    @client   = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
>    transport.open()
>    @table_name = "encoding_test"
>    @column_family = "info:"
>  end
>  it "should create a new table" do
>    column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= 
> @column_family}
>    @client.createTable(@table_name, [column]).should be_nil
>  end
>  it "should save standard caracteres" do
>    m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
>    m.column = "info:first_name"
>    m.value  = "Vincent"
>    m.value.encoding.should == Encoding::UTF_8
>    @client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should save UTF8 caracteres" do
>    m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
>    m.column = "info:first_name"
>    m.value  = "Thorbjørn"
>    m.value.encoding.should == Encoding::UTF_8
>    @client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should destroy the table" do
>    @client.disableTable(@table_name).should be_nil
>    @client.deleteTable(@table_name).should be_nil
>  end
> end
> It fails when it tries to save the UTF8 string including the caractere 'ø'.
> Here is the output :
>  1) encoding should save UTF8 caracteres
>     Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
>     incompatible character encodings: ASCII-8BIT and UTF-8
>     
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
> `write'
>     
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
> `write_string'
>     
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `write'
>     
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `send_message'
>     # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
>     # ./lib/thrift/hbase.rb:284:in `mutateRow'
>     # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
> (required)>'
> Let me know if you need any other details, thank you !

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to