[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529334#comment-13529334
 ] 

Russell Jurney commented on PIG-3015:
-------------------------------------

Strange. I am not able to apply that and get that result. I'll try downloading 
and applying again. Hmmmmmm...

Having this problem loading my emails:

grunt> REGISTER /me/Software/pig-trunk/build/ivy/lib/Pig/avro-1.7.3.jar
grunt> REGISTER /me/Software/pig-trunk/build/ivy/lib/Pig/json-simple-1.1.jar
grunt> REGISTER /me/Software/pig-trunk/contrib/piggybank/java/piggybank.jar
grunt> 
grunt> rmf /tmp/sent_counts.avro
grunt> 
grunt> messages = LOAD '/me/Data/test_inbox' USING AvroStorage();
2012-12-11 13:01:41,690 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2998: Unhandled internal error. org/apache/avro/io/DatumReader
2012-12-11 13:01:41,690 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:510)
        at 
org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1206)
        at 
org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1194)
        at 
org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4766)
        at 
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3183)
        at 
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
        at 
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
        at 
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
        at 
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
        at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1581)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1554)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:526)
        at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:535)
        at org.apache.pig.Main.main(Main.java:154)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.io.DatumReader
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 27 more

Details also at logfile: /private/tmp/pig_1355258041691.log

The schema is: 

Avro Schema: {"fields": [{"doc": "", "type": ["null", "string"], "name": 
"message_id"}, {"doc": "", "type": ["null", "string"], "name": "thread_id"}, 
{"type": ["string", "null"], "name": "in_reply_to"}, {"type": ["string", 
"null"], "name": "subject"}, {"type": ["string", "null"], "name": "body"}, 
{"type": ["string", "null"], "name": "date"}, {"doc": "", "type": ["null", 
{"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": 
"real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], 
"type": "record", "name": "from"}], "type": "array"}], "name": "froms"}, 
{"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": 
["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", 
"string"], "name": "address"}], "type": "record", "name": "to"}], "type": 
"array"}], "name": "tos"}, {"doc": "", "type": ["null", {"items": ["null", 
{"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"}, 
{"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record", 
"name": "cc"}], "type": "array"}], "name": "ccs"}, {"doc": "", "type": ["null", 
{"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": 
"real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], 
"type": "record", "name": "bcc"}], "type": "array"}], "name": "bccs"}, {"doc": 
"", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": 
["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", 
"string"], "name": "address"}], "type": "record", "name": "reply_to"}], "type": 
"array"}], "name": "reply_tos"}], "type": "record", "name": "Email"}

And just to get really meta... here is a JSON output of my Avro serialized 
emails... one from this list: 

{u'bccs': None,
 u'body': u'\r\n    [ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447110#comment-13447110
 ] \r\n\r\nDmitriy V. Ryaboy commented on 
PIG-2661:\r\n----------------------------------------\r\n\r\nor, you know, 
stick a key in MemCache. #whyishadoopsohard\r\n                \r\n> Pig uses 
an extra job for loading data in Pigmix L9\r\n> 
---------------------------------------------------\r\n>\r\n>                 
Key: PIG-2661\r\n>                 URL: 
https://issues.apache.org/jira/browse/PIG-2661\r\n>             Project: 
Pig\r\n>          Issue Type: Improvement\r\n>    Affects Versions: 0.9.0\r\n>  
          Reporter: Jie Li\r\n>            Assignee: Jie Li\r\n>         
Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, 
PIG-2661.3.patch, PIG-2661.4.patch, PIG-2661.5.patch, PIG-2661.6.patch, 
PIG-2661.7.patch, PIG-2661.8.patch, PIG-2661.plan.txt\r\n>\r\n>\r\n> See 
https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155\r\n\r\n--\r\nThis
 message is automatically generated by JIRA.\r\nIf you think it was sent 
incorrectly, please contact your JIRA administrators\r\nFor more information on 
JIRA, see: http://www.atlassian.com/software/jira\r\n',
 u'ccs': None,
 u'date': u'2012-09-03T15:22:07',
 u'froms': [{u'address': u'j...@apache.org',
             u'real_name': u'Dmitriy V. Ryaboy (JIRA)'}],
 u'in_reply_to': 
u'52638020.7802.1335237294825.javamail.tom...@hel.zones.apache.org',
 u'message_id': u'762728135.29484.1346646127701.JavaMail.jiratomcat@arcas',
 u'reply_tos': [{u'address': u'dev@pig.apache.org', u'real_name': None}],
 u'subject': u'[jira] [Commented] (PIG-2661) Pig uses an extra job for 
loading\r\n data in Pigmix L9',
 u'thread_id': u'1400097807569590118',
 u'tos': [{u'address': u'pig-...@hadoop.apache.org', u'real_name': None}]}

                
> Rewrite of AvroStorage
> ----------------------
>
>                 Key: PIG-3015
>                 URL: https://issues.apache.org/jira/browse/PIG-3015
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>            Reporter: Joseph Adler
>            Assignee: Joseph Adler
>         Attachments: PIG-3015.patch
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to