Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
Correction: when I read the file in Python, I get the error below.  It
looks like a unicode problem?  Can one tell Avro how to handle this?

Traceback (most recent call last):
  File ./cat_avro, line 21, in module
for record in df_reader:
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
line 354, in next
datum = self.datum_reader.read(self.datum_decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 445, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 490, in read_data
return self.read_record(writers_schema, readers_schema, decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 690, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 488, in read_data
return self.read_union(writers_schema, readers_schema, decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 654, in read_union
return self.read_data(selected_writers_schema, readers_schema, decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 458, in read_data
return self.read_data(writers_schema, s, decoder)
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 468, in read_data
return decoder.read_utf8()
  File
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
line 233, in read_utf8
return unicode(self.read_bytes(), utf-8)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543:
invalid start byte


On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney russell.jur...@gmail.comwrote:

 I am writing Avro records in Ruby using the avro ruby gem in 1.8.7.  I
 have problems with loading these files sometimes.  As a result, I am unable
 to write large files that are readable.

 The exception I get is below.  Anyone have an idea what this means?  It
 looks like Avro is having trouble parsing the schema.  The avro files parse
 in Ruby and Python, just not Pig.  Are there more rigorous checks in Java?

 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error.
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;

 java.lang.NoSuchMethodError:
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
 at org.apache.avro.Schema.clinit(Schema.java:82)
  at
 org.apache.pig.piggybank.storage.avro.AvroStorageUtils.clinit(AvroStorageUtils.java:49)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
  at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
  at
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
 at
 org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
  at
 org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
 at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
  at
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
 at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
  at
 org.apache.pig.newplan.logical.visitor.CastLineageSetter.init(CastLineageSetter.java:57)
 at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
  at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
 at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
  at
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
 at
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
  at
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
  at org.apache.pig.Main.run(Main.java:495)
 at org.apache.pig.Main.main(Main.java:111)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 

Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
A little bit more searching shows this:

http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/

On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney russell.jur...@gmail.comwrote:

 The jars being used are:

 REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
 REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

 On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari jbaldass...@gmail.comwrote:

 HI Russell,

 I'm not sure about the Python error, but the Java error looks like a
 classpath problem, not a schema parsing issue.  The NoSuchMethodError in
 the stack trace indicates that Avro was trying to invoke a method in the
 Jackson library that wasn't present at run-time.  My guess is that your
 program (or Pig?) either has two incompatible versions of the Jackson
 library on its classpath or maybe Avro's Jackson dependency has been
 excluded and a version that is incompatible with Avro is on the classpath.

 Which version of Avro is being used?  Running 'mvn dependency:tree' in
 Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify that
 only one version of Jackson is on the classpath and that it's the version
 that is required by whatever version of Avro is on the classpath?

 -James



 On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 Correction: when I read the file in Python, I get the error below.  It
 looks like a unicode problem?  Can one tell Avro how to handle this?

 Traceback (most recent call last):
   File ./cat_avro, line 21, in module
 for record in df_reader:
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
 line 354, in next
 datum = self.datum_reader.read(self.datum_decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema,
 decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 690, in read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 654, in read_union
 return self.read_data(selected_writers_schema, readers_schema,
 decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 458, in read_data
 return self.read_data(writers_schema, s, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 468, in read_data
 return decoder.read_utf8()
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 233, in read_utf8
 return unicode(self.read_bytes(), utf-8)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543:
 invalid start byte


  On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney 
 russell.jur...@gmail.com wrote:

 I am writing Avro records in Ruby using the avro ruby gem in 1.8.7.  I
 have problems with loading these files sometimes.  As a result, I am unable
 to write large files that are readable.

 The exception I get is below.  Anyone have an idea what this means?  It
 looks like Avro is having trouble parsing the schema.  The avro files parse
 in Ruby and Python, just not Pig.  Are there more rigorous checks in Java?

 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error.
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;

 java.lang.NoSuchMethodError:
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
 at org.apache.avro.Schema.clinit(Schema.java:82)
  at
 org.apache.pig.piggybank.storage.avro.AvroStorageUtils.clinit(AvroStorageUtils.java:49)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
  at
 

Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
Further examination shows that the problematic emails I am encoding are
formatted in ISO-8859-1, not UTF-8.  That is why I am getting character
problems.  Looks like it is not an Avro problem after all.  Thanks!  :)

On Thu, Feb 2, 2012 at 2:49 PM, Russell Jurney russell.jur...@gmail.comwrote:

 A little bit more searching shows this:


 http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/


 On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 The jars being used are:

 REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
 REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

 On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari 
 jbaldass...@gmail.comwrote:

 HI Russell,

 I'm not sure about the Python error, but the Java error looks like a
 classpath problem, not a schema parsing issue.  The NoSuchMethodError in
 the stack trace indicates that Avro was trying to invoke a method in the
 Jackson library that wasn't present at run-time.  My guess is that your
 program (or Pig?) either has two incompatible versions of the Jackson
 library on its classpath or maybe Avro's Jackson dependency has been
 excluded and a version that is incompatible with Avro is on the classpath.

 Which version of Avro is being used?  Running 'mvn dependency:tree' in
 Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify that
 only one version of Jackson is on the classpath and that it's the version
 that is required by whatever version of Avro is on the classpath?

 -James



 On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney russell.jur...@gmail.com
  wrote:

 Correction: when I read the file in Python, I get the error below.  It
 looks like a unicode problem?  Can one tell Avro how to handle this?

 Traceback (most recent call last):
   File ./cat_avro, line 21, in module
 for record in df_reader:
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
 line 354, in next
 datum = self.datum_reader.read(self.datum_decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 445, in read
 return self.read_data(self.writers_schema, self.readers_schema,
 decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 490, in read_data
 return self.read_record(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 690, in read_record
 field_val = self.read_data(field.type, readers_field.type, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 488, in read_data
 return self.read_union(writers_schema, readers_schema, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 654, in read_union
 return self.read_data(selected_writers_schema, readers_schema,
 decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 458, in read_data
 return self.read_data(writers_schema, s, decoder)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 468, in read_data
 return decoder.read_utf8()
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py,
 line 233, in read_utf8
 return unicode(self.read_bytes(), utf-8)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position
 543: invalid start byte


  On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney 
 russell.jur...@gmail.com wrote:

 I am writing Avro records in Ruby using the avro ruby gem in 1.8.7.  I
 have problems with loading these files sometimes.  As a result, I am 
 unable
 to write large files that are readable.

 The exception I get is below.  Anyone have an idea what this means?
  It looks like Avro is having trouble parsing the schema.  The avro files
 parse in Ruby and Python, just not Pig.  Are there more rigorous checks in
 Java?

 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error.
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;

 java.lang.NoSuchMethodError:
 

Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
Spoken too soon... this happens no matter what avros I load now.  I can't
figure that anything has changed regarding jars, etc.  Confused.

I think this happens when Avro is parsing the schema?

Pig Stack Trace
---
ERROR 2998: Unhandled internal error.
org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;

java.lang.NoSuchMethodError:
org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
at org.apache.avro.Schema.clinit(Schema.java:82)
at
org.apache.pig.piggybank.storage.avro.AvroStorageUtils.clinit(AvroStorageUtils.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.init(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


On Thu, Feb 2, 2012 at 2:53 PM, Russell Jurney russell.jur...@gmail.comwrote:

 Further examination shows that the problematic emails I am encoding are
 formatted in ISO-8859-1, not UTF-8.  That is why I am getting character
 problems.  Looks like it is not an Avro problem after all.  Thanks!  :)


 On Thu, Feb 2, 2012 at 2:49 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 A little bit more searching shows this:


 http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/


 On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 The jars being used are:

 REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
 REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

 On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari 
 jbaldass...@gmail.comwrote:

 HI Russell,

 I'm not sure about the Python error, but the Java error looks like a
 classpath problem, not a schema parsing issue.  The NoSuchMethodError in
 the stack trace indicates that Avro was trying to invoke a method in the
 Jackson library that wasn't present at run-time.  My guess is that your
 program (or Pig?) either has two incompatible versions of the Jackson
 library on its classpath or maybe Avro's Jackson dependency has been
 excluded and a version that is incompatible with Avro is on the classpath.

 Which version of Avro is being used?  Running 'mvn dependency:tree' in
 Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify that
 only one version of Jackson is on the classpath and that it's the version
 that is required by whatever version of Avro is on the classpath?

 -James



 On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney 
 russell.jur...@gmail.com wrote:

 Correction: when I read the file in Python, I get the error below.  It
 looks like a unicode problem?  Can one tell Avro how to handle this?

 Traceback (most recent call last):
   File ./cat_avro, line 21, in module
 for record in df_reader:
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py,
 line 354, in next
 datum = 

Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
Cleaned up my environment by unsetting HADOOP_HOME, and removing some old
jacksons in my CLASSPATH and Pig's AvroStorage works again.

Woot!

On Thu, Feb 2, 2012 at 3:47 PM, Russell Jurney russell.jur...@gmail.comwrote:

 Spoken too soon... this happens no matter what avros I load now.  I can't
 figure that anything has changed regarding jars, etc.  Confused.

 I think this happens when Avro is parsing the schema?

 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error.
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;

 java.lang.NoSuchMethodError:
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
 at org.apache.avro.Schema.clinit(Schema.java:82)
  at
 org.apache.pig.piggybank.storage.avro.AvroStorageUtils.clinit(AvroStorageUtils.java:49)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
  at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
 at
 org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
  at
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
 at
 org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
  at
 org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
 at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
  at
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
 at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
  at
 org.apache.pig.newplan.logical.visitor.CastLineageSetter.init(CastLineageSetter.java:57)
 at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
  at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
 at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
  at
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
 at
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
  at
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
  at org.apache.pig.Main.run(Main.java:495)
 at org.apache.pig.Main.main(Main.java:111)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 

 On Thu, Feb 2, 2012 at 2:53 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 Further examination shows that the problematic emails I am encoding are
 formatted in ISO-8859-1, not UTF-8.  That is why I am getting character
 problems.  Looks like it is not an Avro problem after all.  Thanks!  :)


 On Thu, Feb 2, 2012 at 2:49 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 A little bit more searching shows this:


 http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/


 On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney russell.jur...@gmail.com
  wrote:

 The jars being used are:

 REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
 REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
 REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

 On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari jbaldass...@gmail.com
  wrote:

 HI Russell,

 I'm not sure about the Python error, but the Java error looks like a
 classpath problem, not a schema parsing issue.  The NoSuchMethodError in
 the stack trace indicates that Avro was trying to invoke a method in the
 Jackson library that wasn't present at run-time.  My guess is that your
 program (or Pig?) either has two incompatible versions of the Jackson
 library on its classpath or maybe Avro's Jackson dependency has been
 excluded and a version that is incompatible with Avro is on the classpath.

 Which version of Avro is being used?  Running 'mvn dependency:tree' in
 Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify 
 that
 only one version of Jackson is on the classpath and that it's the version
 that is required by whatever version of Avro is on the classpath?

 -James



 On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney 
 russell.jur...@gmail.com wrote:

 Correction: when I read the file in Python, I get the error below.
  It looks like a unicode problem?  Can one tell Avro how to 

abuse of aliases?

2012-02-02 Thread Koert Kuipers
i have many avro files with similar data (same meaning, same type, etc.)
but different names for the fields.
can i create a reader schema that for each field that i am interested in
maps it to all the different possible fields in the files by using aliases,
and then run map-reduce over the files using this schema?
i am talking about tens of aliases per field, and this number will only
grow as more data comes in.
is this acceptible use of the alias concept, or is it abuse? and is the
alias implementation in avro efficient for such usage?
thanks! koert