tether test
when i check the mapred.tether test example: TestWord CountTether with eclipse i got the error: java.lang.Error:Unresolved compilation problems: MD5 cannot be resolved to a type
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27 5) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader. nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask .java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDa tumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(Pig AvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27 3) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney http://twitter.com/rjurney russell.jur...@gmail.com mailto:russell.jur...@gmail.com datasyndrome.com http://datasyndrome.com/
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
Thanks Scott, looking at the raw data it seems to have been a truncated record due to UTF problems. Russell Jurney http://datasyndrome.com On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote: It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java –jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem
Ok, now I have a followup question... how does one recover from an exception writing an Avro? The incomplete record is being written, which is crashing the reader. On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney russell.jur...@gmail.comwrote: Thanks Scott, looking at the raw data it seems to have been a truncated record due to UTF problems. Russell Jurney http://datasyndrome.com On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote: It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more. What does avro-tools.jar 's 'tojson' tool do? (java –jar avro-tools-1.6.3.jar tojson file | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote: I have a problem record I've written in Avro that crashes anything which tries to read it :( Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273) ... 7 more When reading the record in Python: File /me/Collecting-Data/src/python/cat_avro, line 21, in module for record in df_reader: File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in next datum = self.datum_reader.read(self.datum_decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in read_union raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema [string,null] do not match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. com -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com