Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-26 Thread Scott Carey
Avro Java's file writer[1] (the last several versions) rewinds its buffer if
there is an exception during writing, so if there are writes afterwords the
file will not be corrupt.  However, most tools are not so careful.

[1] DataFileWriter.append()
http://svn.apache.org/repos/asf/avro/trunk/lang/java/avro/src/main/java/org/
apache/avro/file/DataFileWriter.java


On 3/23/12 8:27 PM, Russell Jurney russell.jur...@gmail.com wrote:

 Ok, now I have a followup question...
 
 how does one recover from an exception writing an Avro?  The incomplete record
 is being written, which is crashing the reader.
 
 On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney russell.jur...@gmail.com
 wrote:
 Thanks Scott, looking at the raw data it seems to have been a truncated
 record due to UTF problems.
 
 Russell Jurney http://datasyndrome.com
 
 On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote:
 
 
 It appears to be reading a union index and failing in there somehow.  If it
 did not have any of the pig AvroStorage stuff in there I could tell you
 more.
 
 What does avro-tools.jar 's 'tojson' tool do?  (java ­jar
 avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)
 What version of Avro is the java stack trace below?
 
 
 On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:
 
 I have a problem record I've written in Avro that crashes anything which
 tries to read it :(
 
 Can anyone make sense of these errors?
 
 The exception in Pig/AvroStorage is this:
 
 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java
 :275)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
 er.nextKeyValue(PigRecordReader.java:187)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
 ask.java:532)
 at 
 org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
 at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
 at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
 at 
 org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14
 2)
 at 
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvr
 oDatumReader.java:67)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
 8)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
 9)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
 at 
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(
 PigAvroRecordReader.java:80)
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java
 :273)
 ... 7 more
 
 When reading the record in Python:
 
 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354,
 in next
 datum = self.datum_reader.read(self.datum_decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in
 read
 return self.read_data(self.writers_schema, self.readers_schema,
 decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in
 read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in
 read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in
 read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
 /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in
 read_union
 raise SchemaResolutionException(fail_msg, writers_schema,
 readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union
 with 2 branches
 
 When reading the record in Ruby:
 
 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298
 :in `read_data': Writer's schema  and Reader's schema [string,null] do
 not match. (Avro::IO::SchemaMatchException)
 
 -- 
 Russell Jurney twitter.com/rjurney http://twitter.com/rjurney
 russell.jur...@gmail.com mailto:russell.jur...@gmail.com
 datasyndrome.com http://datasyndrome.com/
 
 
 
 -- 
 Russell Jurney twitter.com/rjurney 

Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-23 Thread Scott Carey

It appears to be reading a union index and failing in there somehow.  If it
did not have any of the pig AvroStorage stuff in there I could tell you
more.

What does avro-tools.jar 's 'tojson' tool do?  (java ­jar
avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)
What version of Avro is the java stack trace below?


On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:

 I have a problem record I've written in Avro that crashes anything which tries
 to read it :(
 
 Can anyone make sense of these errors?
 
 The exception in Pig/AvroStorage is this:
 
 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27
 5)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.
 nextKeyValue(PigRecordReader.java:187)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask
 .java:532)
 at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
 at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
 at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
 at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDa
 tumReader.java:67)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
 at 
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(Pig
 AvroRecordReader.java:80)
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27
 3)
 ... 7 more
 
 When reading the record in Python:
 
 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in
 next
 datum = self.datum_reader.read(self.datum_decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in
 read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in
 read_union
 raise SchemaResolutionException(fail_msg, writers_schema, readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union
 with 2 branches
 
 When reading the record in Ruby:
 
 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
 `read_data': Writer's schema  and Reader's schema [string,null] do not
 match. (Avro::IO::SchemaMatchException)
 
 -- 
 Russell Jurney twitter.com/rjurney http://twitter.com/rjurney
 russell.jur...@gmail.com mailto:russell.jur...@gmail.com  datasyndrome.com
 http://datasyndrome.com/




Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-23 Thread Russell Jurney
Thanks Scott, looking at the raw data it seems to have been a truncated record 
due to UTF problems.

Russell Jurney http://datasyndrome.com

On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote:

 
 It appears to be reading a union index and failing in there somehow.  If it 
 did not have any of the pig AvroStorage stuff in there I could tell you more.
 
 What does avro-tools.jar 's 'tojson' tool do?  (java –jar 
 avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)  
 What version of Avro is the java stack trace below?
 
 
 On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:
 
 I have a problem record I've written in Avro that crashes anything which 
 tries to read it :(
 
 Can anyone make sense of these errors?
 
 The exception in Pig/AvroStorage is this:
 
 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
   at 
 org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
   at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
   at 
 org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
   at 
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
   at 
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
   ... 7 more
 
 When reading the record in Python:  
 
 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
  line 354, in next
 datum = self.datum_reader.read(self.datum_decoder) 
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 690, in read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 650, in read_union
 raise SchemaResolutionException(fail_msg, writers_schema, readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union 
 with 2 branches
 
 When reading the record in Ruby:
 
 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in 
 `read_data': Writer's schema  and Reader's schema [string,null] do not 
 match. (Avro::IO::SchemaMatchException)
 
 -- 
 Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com


Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-23 Thread Russell Jurney
Ok, now I have a followup question...

how does one recover from an exception writing an Avro?  The incomplete
record is being written, which is crashing the reader.

On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney russell.jur...@gmail.comwrote:

 Thanks Scott, looking at the raw data it seems to have been a truncated
 record due to UTF problems.

 Russell Jurney http://datasyndrome.com

 On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote:


 It appears to be reading a union index and failing in there somehow.  If
 it did not have any of the pig AvroStorage stuff in there I could tell you
 more.

 What does avro-tools.jar 's 'tojson' tool do?  (java –jar
 avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)
 What version of Avro is the java stack trace below?


 On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:

 I have a problem record I've written in Avro that crashes anything which
 tries to read it :(

 Can anyone make sense of these errors?

 The exception in Pig/AvroStorage is this:

 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
 at
 org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
 at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
 at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
 at
 org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
 at
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
 ... 7 more


 When reading the record in Python:

 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
 line 354, in next
 datum = self.datum_reader.read(self.datum_decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema,
 decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 690, in read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 650, in read_union
 raise SchemaResolutionException(fail_msg, writers_schema,
 readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union
 with 2 branches


 When reading the record in Ruby:

 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
 `read_data': Writer's schema  and Reader's schema [string,null] do not
 match. (Avro::IO::SchemaMatchException)


 --
 Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.
 com




-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com