[ 
https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501145#comment-13501145
 ] 

Tophe Vigny commented on AVRO-1206:
-----------------------------------

Hi Doug, 
I can help with the test issue.

i get a copy from trunk
svn checkout http://svn.apache.org/repos/asf/avro/trunk/lang/ruby/
A    ruby/test
A    ruby/test/tool.rb
A    ruby/test/test_protocol.rb
A    ruby/test/sample_ipc_http_server.rb
A    ruby/test/sample_ipc_server.rb
A    ruby/test/test_socket_transport.rb
A    ruby/test/test_io.rb
A    ruby/test/test_help.rb
A    ruby/test/test_datafile.rb
A    ruby/test/random_data.rb
A    ruby/test/sample_ipc_http_client.rb
A    ruby/test/sample_ipc_client.rb
A    ruby/interop
A    ruby/interop/test_interop.rb
A    ruby/Rakefile
A    ruby/.gitignore
A    ruby/Manifest
A    ruby/lib
A    ruby/lib/avro
A    ruby/lib/avro/schema.rb
A    ruby/lib/avro/protocol.rb
A    ruby/lib/avro/io.rb
A    ruby/lib/avro/collect_hash.rb
A    ruby/lib/avro/data_file.rb
A    ruby/lib/avro/ipc.rb
A    ruby/lib/avro.rb
A    ruby/CHANGELOG
 U   ruby
Révision 1411599 extraite.

and then
patch < AVRO-1206.patch 
patching file Rakefile
patching file io.rb
Hunk #1 FAILED at 201.
1 out of 1 hunk FAILED -- saving rejects to file io.rb.rej
patching file test_datafile.rb
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 140.
2 out of 2 hunks FAILED -- saving rejects to file test_datafile.rb.rej

what the matter ?
I have merged the the test manualy, do some code modification to ensure loading 
of the ../lib/avro.
and so : with your test :
original io.rb
Tophe@info3:~/work/ruby/test$ ruby test_datafile.rb 
Run options: 

# Running tests:

...F

Finished tests in 0.088778s, 45.0561 tests/s, 878.5939 assertions/s.

  1) Failure:
test_utf8(TestDataFile) [test_datafile.rb:155]:
<"家"> expected but was
<"\xE5">.

4 tests, 78 assertions, 1 failures, 0 errors, 0 skips
only one byte stored fo a  bytes char, and with modified io.rb
Tophe@info3:~/work/ruby/test$ ruby test_datafile.rb 
Run options: 

# Running tests:

....

Finished tests in 0.088450s, 45.2230 tests/s, 881.8492 assertions/s.

4 tests, 78 assertions, 0 failures, 0 errors, 0 skips

you need to add 

#encoding: utf-8 at the begining of test_datafile.rb
for the assertion, we can do :
      (rmaj,rmin,rlast) = RUBY_VERSION.split(".").map {|a| a.to_i}
      if rmaj <2 &&  rmin < 9
        assert_equal "家", s
      else
        assert_equal "家", s.force_encoding('UTF-8') 
      end
that test work with ruby 1.8 and >= 1.9 because of the encoding awareness of 
1.9 ruby branche, you need to specify encoding, or we need to compre in binary.

is it possible to specify the encoding in the schema, either for all data, or 
by string type ? that could contribute to have the reader returning correct 
string encoding.
that can be more simple to use, because reader don't need to know the encoding.

the problem for you is that you are loading the gem and not the ../lib, and you 
have made the correction on the gem. (I have the same problem, and I spend some 
time on that)
try that:
gem uninstall avro (all)

and run the test , it should not run. because there are some require 'avro' 
along the code, and that load the gem, not the source code.
to load the source code you should do in the test_help.rb :

$LOAD_PATH << '../lib/'
require 'avro'
that way, avro.rb should be loaded from ../lib and not $GEM_HOME/...

I can send you a patch, if I can apply your patch on the trunk. tell me if you 
need, and what to do.
by the way you can remove the FIXME in
def write_string(datum)
    # FIXME utf-8 encode this in 1.9
    write_bytes(datum)
end



                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break 
> with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': 
> undefined method `type' for nil:NilClass (NoMethodError)
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in 
> `read_data'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in 
> `read_union'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in 
> `read_data'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in 
> `block in read_record'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in 
> `each'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in 
> `read_record'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in 
> `read_data'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in 
> `read'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in
>  `block in each'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in
>  `loop'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in
>  `each'
>       from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to