tether test

2012-03-23 Thread 伍复慧
when i check the mapred.tether test example: TestWord CountTether with eclipse
i got the error:
java.lang.Error:Unresolved compilation problems:
  MD5 cannot be resolved to a type


Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-23 Thread Scott Carey

It appears to be reading a union index and failing in there somehow.  If it
did not have any of the pig AvroStorage stuff in there I could tell you
more.

What does avro-tools.jar 's 'tojson' tool do?  (java ­jar
avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)
What version of Avro is the java stack trace below?


On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:

 I have a problem record I've written in Avro that crashes anything which tries
 to read it :(
 
 Can anyone make sense of these errors?
 
 The exception in Pig/AvroStorage is this:
 
 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27
 5)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.
 nextKeyValue(PigRecordReader.java:187)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask
 .java:532)
 at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
 at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
 at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
 at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDa
 tumReader.java:67)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
 at 
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(Pig
 AvroRecordReader.java:80)
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27
 3)
 ... 7 more
 
 When reading the record in Python:
 
 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py, line 354, in
 next
 datum = self.datum_reader.read(self.datum_decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 690, in
 read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
 te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py, line 650, in
 read_union
 raise SchemaResolutionException(fail_msg, writers_schema, readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union
 with 2 branches
 
 When reading the record in Ruby:
 
 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
 `read_data': Writer's schema  and Reader's schema [string,null] do not
 match. (Avro::IO::SchemaMatchException)
 
 -- 
 Russell Jurney twitter.com/rjurney http://twitter.com/rjurney
 russell.jur...@gmail.com mailto:russell.jur...@gmail.com  datasyndrome.com
 http://datasyndrome.com/




Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-23 Thread Russell Jurney
Thanks Scott, looking at the raw data it seems to have been a truncated record 
due to UTF problems.

Russell Jurney http://datasyndrome.com

On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote:

 
 It appears to be reading a union index and failing in there somehow.  If it 
 did not have any of the pig AvroStorage stuff in there I could tell you more.
 
 What does avro-tools.jar 's 'tojson' tool do?  (java –jar 
 avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)  
 What version of Avro is the java stack trace below?
 
 
 On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:
 
 I have a problem record I've written in Avro that crashes anything which 
 tries to read it :(
 
 Can anyone make sense of these errors?
 
 The exception in Pig/AvroStorage is this:
 
 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
   at 
 org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
   at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
   at 
 org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
   at 
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
   at 
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
   ... 7 more
 
 When reading the record in Python:  
 
 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
  line 354, in next
 datum = self.datum_reader.read(self.datum_decoder) 
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 690, in read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File 
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
  line 650, in read_union
 raise SchemaResolutionException(fail_msg, writers_schema, readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union 
 with 2 branches
 
 When reading the record in Ruby:
 
 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in 
 `read_data': Writer's schema  and Reader's schema [string,null] do not 
 match. (Avro::IO::SchemaMatchException)
 
 -- 
 Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com


Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schem

2012-03-23 Thread Russell Jurney
Ok, now I have a followup question...

how does one recover from an exception writing an Avro?  The incomplete
record is being written, which is crashing the reader.

On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney russell.jur...@gmail.comwrote:

 Thanks Scott, looking at the raw data it seems to have been a truncated
 record due to UTF problems.

 Russell Jurney http://datasyndrome.com

 On Mar 23, 2012, at 7:59 PM, Scott Carey scottca...@apache.org wrote:


 It appears to be reading a union index and failing in there somehow.  If
 it did not have any of the pig AvroStorage stuff in there I could tell you
 more.

 What does avro-tools.jar 's 'tojson' tool do?  (java –jar
 avro-tools-1.6.3.jar tojson file | your_favorite_text_reader)
 What version of Avro is the java stack trace below?


 On 3/23/12 7:01 PM, Russell Jurney russell.jur...@gmail.com wrote:

 I have a problem record I've written in Avro that crashes anything which
 tries to read it :(

 Can anyone make sense of these errors?

 The exception in Pig/AvroStorage is this:

 java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
 at
 org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
 at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
 at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
 at
 org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at
 org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
 at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
 at
 org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
 ... 7 more


 When reading the record in Python:

 File /me/Collecting-Data/src/python/cat_avro, line 21, in module
 for record in df_reader:
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
 line 354, in next
 datum = self.datum_reader.read(self.datum_decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema,
 decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 690, in read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 650, in read_union
 raise SchemaResolutionException(fail_msg, writers_schema,
 readers_schema)
 avro.io.SchemaResolutionException: Can't access branch index 64 for union
 with 2 branches


 When reading the record in Ruby:

 /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
 `read_data': Writer's schema  and Reader's schema [string,null] do not
 match. (Avro::IO::SchemaMatchException)


 --
 Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.
 com




-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com