[
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529334#comment-13529334
]
Russell Jurney commented on PIG-3015:
-------------------------------------
Strange. I am not able to apply that and get that result. I'll try downloading
and applying again. Hmmmmmm...
Having this problem loading my emails:
grunt> REGISTER /me/Software/pig-trunk/build/ivy/lib/Pig/avro-1.7.3.jar
grunt> REGISTER /me/Software/pig-trunk/build/ivy/lib/Pig/json-simple-1.1.jar
grunt> REGISTER /me/Software/pig-trunk/contrib/piggybank/java/piggybank.jar
grunt>
grunt> rmf /tmp/sent_counts.avro
grunt>
grunt> messages = LOAD '/me/Data/test_inbox' USING AvroStorage();
2012-12-11 13:01:41,690 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
2998: Unhandled internal error. org/apache/avro/io/DatumReader
2012-12-11 13:01:41,690 [main] ERROR org.apache.pig.tools.grunt.Grunt -
java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:510)
at
org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1206)
at
org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1194)
at
org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4766)
at
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3183)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1581)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1554)
at org.apache.pig.PigServer.registerQuery(PigServer.java:526)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:535)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.io.DatumReader
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 27 more
Details also at logfile: /private/tmp/pig_1355258041691.log
The schema is:
Avro Schema: {"fields": [{"doc": "", "type": ["null", "string"], "name":
"message_id"}, {"doc": "", "type": ["null", "string"], "name": "thread_id"},
{"type": ["string", "null"], "name": "in_reply_to"}, {"type": ["string",
"null"], "name": "subject"}, {"type": ["string", "null"], "name": "body"},
{"type": ["string", "null"], "name": "date"}, {"doc": "", "type": ["null",
{"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name":
"real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}],
"type": "record", "name": "from"}], "type": "array"}], "name": "froms"},
{"doc": "", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type":
["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null",
"string"], "name": "address"}], "type": "record", "name": "to"}], "type":
"array"}], "name": "tos"}, {"doc": "", "type": ["null", {"items": ["null",
{"fields": [{"doc": "", "type": ["null", "string"], "name": "real_name"},
{"doc": "", "type": ["null", "string"], "name": "address"}], "type": "record",
"name": "cc"}], "type": "array"}], "name": "ccs"}, {"doc": "", "type": ["null",
{"items": ["null", {"fields": [{"doc": "", "type": ["null", "string"], "name":
"real_name"}, {"doc": "", "type": ["null", "string"], "name": "address"}],
"type": "record", "name": "bcc"}], "type": "array"}], "name": "bccs"}, {"doc":
"", "type": ["null", {"items": ["null", {"fields": [{"doc": "", "type":
["null", "string"], "name": "real_name"}, {"doc": "", "type": ["null",
"string"], "name": "address"}], "type": "record", "name": "reply_to"}], "type":
"array"}], "name": "reply_tos"}], "type": "record", "name": "Email"}
And just to get really meta... here is a JSON output of my Avro serialized
emails... one from this list:
{u'bccs': None,
u'body': u'\r\n [
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447110#comment-13447110
] \r\n\r\nDmitriy V. Ryaboy commented on
PIG-2661:\r\n----------------------------------------\r\n\r\nor, you know,
stick a key in MemCache. #whyishadoopsohard\r\n \r\n> Pig uses
an extra job for loading data in Pigmix L9\r\n>
---------------------------------------------------\r\n>\r\n>
Key: PIG-2661\r\n> URL:
https://issues.apache.org/jira/browse/PIG-2661\r\n> Project:
Pig\r\n> Issue Type: Improvement\r\n> Affects Versions: 0.9.0\r\n>
Reporter: Jie Li\r\n> Assignee: Jie Li\r\n>
Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch,
PIG-2661.3.patch, PIG-2661.4.patch, PIG-2661.5.patch, PIG-2661.6.patch,
PIG-2661.7.patch, PIG-2661.8.patch, PIG-2661.plan.txt\r\n>\r\n>\r\n> See
https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155\r\n\r\n--\r\nThis
message is automatically generated by JIRA.\r\nIf you think it was sent
incorrectly, please contact your JIRA administrators\r\nFor more information on
JIRA, see: http://www.atlassian.com/software/jira\r\n',
u'ccs': None,
u'date': u'2012-09-03T15:22:07',
u'froms': [{u'address': u'[email protected]',
u'real_name': u'Dmitriy V. Ryaboy (JIRA)'}],
u'in_reply_to':
u'[email protected]',
u'message_id': u'762728135.29484.1346646127701.JavaMail.jiratomcat@arcas',
u'reply_tos': [{u'address': u'[email protected]', u'real_name': None}],
u'subject': u'[jira] [Commented] (PIG-2661) Pig uses an extra job for
loading\r\n data in Pigmix L9',
u'thread_id': u'1400097807569590118',
u'tos': [{u'address': u'[email protected]', u'real_name': None}]}
> Rewrite of AvroStorage
> ----------------------
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
> Issue Type: Improvement
> Components: piggybank
> Reporter: Joseph Adler
> Assignee: Joseph Adler
> Attachments: PIG-3015.patch
>
>
> The current AvroStorage implementation has a lot of issues: it requires old
> versions of Avro, it copies data much more than needed, and it's verbose and
> complicated. (One pet peeve of mine is that old versions of Avro don't
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the
> new implementation is significantly faster, and the code is a lot simpler.
> Rewriting AvroStorage also enabled me to implement support for Trevni (as
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best
> way to contribute the changes back to Apache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira