Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
Avro Java's file writer[1] (the last several versions) rewinds its buffer if there is an exception during writing, so if there are writes afterwords the file will not be corrupt. However, most tools are not so careful. [1] DataFileWriter.append() http://svn.apache.org/repos/asf/avro/trunk/lang/java/avro/src/main/java/org/ apache/avro/file/DataFileWriter.java On 3/23/12 8:27 PM, Russell Jurney russell.jur...@gmail.com wrote: Ok, now I have a followup question... how does one recover from an exception writing an Avro? The incomplete record is being written, which is crashing the reader. On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney russell.jur...@gmail.com wrote: Thanks Scott, looking at the raw data it seems to have been a truncated record due to UTF problems. Russell Jurney http://datasyndrome.com On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote: It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java :275) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead er.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT ask.java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14 2) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvr oDatumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13 8) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12 9) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue( PigAvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java :273) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298 :in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney http://twitter.com/rjurney russell.jur...@gmail.com mailto:russell.jur...@gmail.com datasyndrome.com http://datasyndrome.com/ -- Russell Jurney twitter.com/rjurney
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27 5) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader. nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask .java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDa tumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(Pig AvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27 3) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney http://twitter.com/rjurney russell.jur...@gmail.com mailto:russell.jur...@gmail.com datasyndrome.com http://datasyndrome.com/
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
Thanks Scott, looking at the raw data it seems to have been a truncated record due to UTF problems. Russell Jurney http://datasyndrome.com On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote: It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java –jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
Ok, now I have a followup question... how does one recover from an exception writing an Avro? The incomplete record is being written, which is crashing the reader. On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney russell.jur...@gmail.comwrote: Thanks Scott, looking at the raw data it seems to have been a truncated record due to UTF problems. Russell Jurney http://datasyndrome.com On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote: It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java –jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. com -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com