[ 
https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047936#comment-14047936
 ] 

Willem van Bergen commented on AVRO-1499:
-----------------------------------------

I think it fails because the library uses `size` instead of `bytesize`. In Ruby 
1.9+, size returns the number of characters, not the number of bytes in a 
string. Which means that in a unicode string, the length of a string that gets 
written is too short.

I will attach a patch that aliases `bytesize` to `size` in Ruby 1.8, and uses 
bytesize.

> Ruby 2+ Writes Invalid avro files using the avro gem
> ----------------------------------------------------
>
>                 Key: AVRO-1499
>                 URL: https://issues.apache.org/jira/browse/AVRO-1499
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.5
>            Reporter: Michael Ries
>            Assignee: Martin Kleppmann
>              Labels: ruby
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1499.patch
>
>
> The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It 
> appears to work correctly under jruby-1.7.10 and ruby 1.9.3.
> Here is a reproducible:
> ```ruby
> require 'avro'
>  
> data = [
>   {"guid"=>"144045de-eb44-dd1b-d9af-6c8b5d41a96e", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617818, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"51e06057-14d2-7527-81fa-b07dba0a263b", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Student Loans 
> R' Us", "created_at"=>1386178342, "updated_at"=>1398180286, 
> "deleted_at"=>nil},
>   {"guid"=>"b4d1d99f-4351-d0e7-221c-a3fae08716bc", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617026, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"084638fa-a78d-bbdd-e075-7c9c957a9b46", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617138, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"79287c76-4e8f-0a21-7569-a2bcdc2b2f4d", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617135, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Cayman Islands 
> Bank", "created_at"=>1386902345, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"75e1e56c-7611-4030-d002-afa2af70e5a1", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617427, "updated_at"=>1398180288, "deleted_at"=>nil},
> ]
>  
> member_schema = <<-SCHEMA
> {"namespace": "md.data_logs",
>  "type": "record",
>  "name": "Member",
>  "fields": [
>      {"name": "guid", "type": "string"},
>      {"name": "user_guid", "type": "string"},
>      {"name": "name", "type": ["string","null"]},
>      {"name": "created_at", "type":"long"},
>      {"name": "updated_at", "type":"long"},
>      {"name": "deleted_at", "type":["long","null"]}
>  ]
> }
> SCHEMA
> filepath = "./members.avro"
> File.unlink(filepath) if File.exists?(filepath)
>  
> Avro::DataFile.open(filepath, "w", member_schema) do |dw|
>   data.each do |entry|
>     dw << entry
>   end
> end
>  
>  
> entries = []
> Avro::DataFile.open(filepath, "r") do |reader|
>   reader.each do |entry|
>     entries << entry
>   end
> end
>  
> puts "Here is the data I wrote into the file:"
> data.each{|e| p e }
> print "\n\n\n\n"
>  
> puts "Here is the data I read from the file:"
> entries.each{|e| p e }
> ```
> Under ruby 2+ it fails with the message "undefined method 'unpack' for 
> nil:NilClass (NoMethodError)". I have also tested that the rubygem can 
> correctly read avro files written by the java client, but the java client 
> fails to read files written by the ruby client, so the issue is definitely in 
> how the rubygem is trying to write the binary file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to