[ https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047936#comment-14047936 ]
Willem van Bergen commented on AVRO-1499: ----------------------------------------- I think it fails because the library uses `size` instead of `bytesize`. In Ruby 1.9+, size returns the number of characters, not the number of bytes in a string. Which means that in a unicode string, the length of a string that gets written is too short. I will attach a patch that aliases `bytesize` to `size` in Ruby 1.8, and uses bytesize. > Ruby 2+ Writes Invalid avro files using the avro gem > ---------------------------------------------------- > > Key: AVRO-1499 > URL: https://issues.apache.org/jira/browse/AVRO-1499 > Project: Avro > Issue Type: Bug > Components: ruby > Affects Versions: 1.7.5 > Reporter: Michael Ries > Assignee: Martin Kleppmann > Labels: ruby > Fix For: 1.7.7 > > Attachments: AVRO-1499.patch > > > The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It > appears to work correctly under jruby-1.7.10 and ruby 1.9.3. > Here is a reproducible: > ```ruby > require 'avro' > > data = [ > {"guid"=>"144045de-eb44-dd1b-d9af-6c8b5d41a96e", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome > Bank", "created_at"=>1390617818, "updated_at"=>1398180288, "deleted_at"=>nil}, > {"guid"=>"51e06057-14d2-7527-81fa-b07dba0a263b", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Student Loans > R' Us", "created_at"=>1386178342, "updated_at"=>1398180286, > "deleted_at"=>nil}, > {"guid"=>"b4d1d99f-4351-d0e7-221c-a3fae08716bc", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome > Bank", "created_at"=>1390617026, "updated_at"=>1398180288, "deleted_at"=>nil}, > {"guid"=>"084638fa-a78d-bbdd-e075-7c9c957a9b46", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome > Bank", "created_at"=>1390617138, "updated_at"=>1398180288, "deleted_at"=>nil}, > {"guid"=>"79287c76-4e8f-0a21-7569-a2bcdc2b2f4d", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome > Bank", "created_at"=>1390617135, "updated_at"=>1398180288, "deleted_at"=>nil}, > {"guid"=>"3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Cayman Islands > Bank", "created_at"=>1386902345, "updated_at"=>1398180288, "deleted_at"=>nil}, > {"guid"=>"75e1e56c-7611-4030-d002-afa2af70e5a1", > "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome > Bank", "created_at"=>1390617427, "updated_at"=>1398180288, "deleted_at"=>nil}, > ] > > member_schema = <<-SCHEMA > {"namespace": "md.data_logs", > "type": "record", > "name": "Member", > "fields": [ > {"name": "guid", "type": "string"}, > {"name": "user_guid", "type": "string"}, > {"name": "name", "type": ["string","null"]}, > {"name": "created_at", "type":"long"}, > {"name": "updated_at", "type":"long"}, > {"name": "deleted_at", "type":["long","null"]} > ] > } > SCHEMA > filepath = "./members.avro" > File.unlink(filepath) if File.exists?(filepath) > > Avro::DataFile.open(filepath, "w", member_schema) do |dw| > data.each do |entry| > dw << entry > end > end > > > entries = [] > Avro::DataFile.open(filepath, "r") do |reader| > reader.each do |entry| > entries << entry > end > end > > puts "Here is the data I wrote into the file:" > data.each{|e| p e } > print "\n\n\n\n" > > puts "Here is the data I read from the file:" > entries.each{|e| p e } > ``` > Under ruby 2+ it fails with the message "undefined method 'unpack' for > nil:NilClass (NoMethodError)". I have also tested that the rubygem can > correctly read avro files written by the java client, but the java client > fails to read files written by the ruby client, so the issue is definitely in > how the rubygem is trying to write the binary file. -- This message was sent by Atlassian JIRA (v6.2#6252)