[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529334#comment-13529334 ]
Russell Jurney commented on PIG-3015: ------------------------------------- Strange. I am not able to apply that and get that result. I'll try downloading and applying again. Hmmmmmm... Having this problem loading my emails: grunt> REGISTER /me/Software/pig-trunk/build/ivy/lib/Pig/avro-1.7.3.jar grunt> REGISTER /me/Software/pig-trunk/build/ivy/lib/Pig/json-simple-1.1.jar grunt> REGISTER /me/Software/pig-trunk/contrib/piggybank/java/piggybank.jar grunt> grunt> rmf /tmp/sent_counts.avro grunt> grunt> messages = LOAD '/me/Data/test_inbox' USING AvroStorage(); 2012-12-11 13:01:41,690 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/avro/io/DatumReader 2012-12-11 13:01:41,690 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:510) at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1206) at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1194) at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4766) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3183) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1581) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1554) at org.apache.pig.PigServer.registerQuery(PigServer.java:526) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:535) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.apache.avro.io.DatumReader at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 27 more Details also at logfile: /private/tmp/pig_1355258041691.log The schema is: Avro Schema: {"fields": [{"doc": "", "type": ["null", "string"], "name": "message_id"}, {"doc": "", "type": ["null", "string"], "name": "thread_id"}, {"type": ["string", "null"], "name": "in_reply_to"}, {"type": ["string", "null"], "name": "subject"}, {"type": ["string", "null"], "name": "body"}, {"type": ["string", "null"], "name": "date"}, {"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record", "name": "from"}], "type": "array"}], "name": "froms"}, {"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record", "name": "to"}], "type": "array"}], "name": "tos"}, {"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record", "name": "cc"}], "type": "array"}], "name": "ccs"}, {"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record", "name": "bcc"}], "type": "array"}], "name": "bccs"}, {"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record", "name": "reply_to"}], "type": "array"}], "name": "reply_tos"}], "type": "record", "name": "Email"} And just to get really meta... here is a JSON output of my Avro serialized emails... one from this list: {u'bccs': None, u'body': u'\r\n [ https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447110#comment-13447110 ] \r\n\r\nDmitriy V. Ryaboy commented on PIG-2661:\r\n----------------------------------------\r\n\r\nor, you know, stick a key in MemCache. #whyishadoopsohard\r\n \r\n> Pig uses an extra job for loading data in Pigmix L9\r\n> ---------------------------------------------------\r\n>\r\n> Key: PIG-2661\r\n> URL: https://issues.apache.org/jira/browse/PIG-2661\r\n> Project: Pig\r\n> Issue Type: Improvement\r\n> Affects Versions: 0.9.0\r\n> Reporter: Jie Li\r\n> Assignee: Jie Li\r\n> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, PIG-2661.3.patch, PIG-2661.4.patch, PIG-2661.5.patch, PIG-2661.6.patch, PIG-2661.7.patch, PIG-2661.8.patch, PIG-2661.plan.txt\r\n>\r\n>\r\n> See https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155\r\n\r\n--\r\nThis message is automatically generated by JIRA.\r\nIf you think it was sent incorrectly, please contact your JIRA administrators\r\nFor more information on JIRA, see: http://www.atlassian.com/software/jira\r\n', u'ccs': None, u'date': u'2012-09-03T15:22:07', u'froms': [{u'address': u'j...@apache.org', u'real_name': u'Dmitriy V. Ryaboy (JIRA)'}], u'in_reply_to': u'52638020.7802.1335237294825.javamail.tom...@hel.zones.apache.org', u'message_id': u'762728135.29484.1346646127701.JavaMail.jiratomcat@arcas', u'reply_tos': [{u'address': u'dev@pig.apache.org', u'real_name': None}], u'subject': u'[jira] [Commented] (PIG-2661) Pig uses an extra job for loading\r\n data in Pigmix L9', u'thread_id': u'1400097807569590118', u'tos': [{u'address': u'pig-...@hadoop.apache.org', u'real_name': None}]} > Rewrite of AvroStorage > ---------------------- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank > Reporter: Joseph Adler > Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira