Created https://jira.mongodb.org/browse/HADOOP-19 and https://issues.apache.org/jira/browse/PIG-2509
On Mon, Feb 6, 2012 at 10:13 AM, Jonathan Coveney <[email protected]>wrote: > I think this specific issue is worth filing a new bug for, since there > wasn't a specific ticket on my radar that covered the issue of > getSchemaFromString not working properly in this use case. > > > 2012/2/5 Russell Jurney <[email protected]> > >> Do I file this, or is it a dupe? I saw lots of existing tickets that >> look similar. >> >> >> On Sun, Feb 5, 2012 at 1:53 PM, Dmitriy Ryaboy <[email protected]>wrote: >> >>> That tuple name has been named optional but I guess some places still >>> assume it exists. >>> + jon. >>> >>> On Sun, Feb 5, 2012 at 1:16 AM, Russell Jurney <[email protected] >>> >wrote: >>> >>> > This now seems like a bug in Utils.getSchemaFromString >>> > >>> > On Sun, Feb 5, 2012 at 1:02 AM, Russell Jurney < >>> [email protected] >>> > >wrote: >>> > >>> > > To answer my own question, this is because the schemas differ. The >>> > schema >>> > > in the working case has a named tuple via AvroStorage. Storing to >>> Mongo >>> > > works when I name the tuple: >>> > > >>> > > ... >>> > > sent_topics = FOREACH froms GENERATE FLATTEN(group) AS (from, to), >>> > > pairs.subject AS pairs:bag {column:tuple (subject:chararray)}; >>> > > >>> > > STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING >>> > > MongoStorage(); >>> > > >>> > > >>> > > I will stop cross-posting to myself now. >>> > > >>> > > >>> > > On Sun, Feb 5, 2012 at 12:47 AM, Russell Jurney < >>> > [email protected]>wrote: >>> > > >>> > >> sent_topics = LOAD '/tmp/pair_titles.avro' USING AvroStorage(); >>> > >> STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING >>> > >> MongoStorage(); >>> > >> >>> > >> That works. Why is it the case that MongoStorage only works if the >>> > >> intermediate processing doesn't happen? Strangeness. >>> > >> >>> > >> On Sun, Feb 5, 2012 at 12:31 AM, Russell Jurney < >>> > [email protected] >>> > >> > wrote: >>> > >> >>> > >>> MongoStorage is failing for me now, on a script that was failing >>> > before. >>> > >>> Is anyone else using it? The schema is [from:chararray, >>> to:chararray, >>> > >>> pairs:{null:(subject:chararray)}], which worked before. >>> > >>> >>> > >>> 2012-02-05 00:27:54,991 [Thread-15] INFO >>> > >>> com.mongodb.hadoop.pig.MongoStorage - Store Location Config: >>> > >>> Configuration: core-default.xml, core-site.xml, mapred-default.xml, >>> > >>> mapred-site.xml, >>> > >>> /tmp/hadoop-rjurney/mapred/local/localRunner/job_local_0001.xml For >>> > URI: >>> > >>> mongodb://localhost/test.pigola >>> > >>> 2012-02-05 00:27:54,993 [Thread-15] INFO >>> > >>> com.mongodb.hadoop.pig.MongoStorage - OutputFormat... >>> > >>> com.mongodb.hadoop.MongoOutputFormat@4eb7cd92 >>> > >>> 2012-02-05 00:27:55,291 [Thread-15] INFO >>> > >>> com.mongodb.hadoop.pig.MongoStorage - Preparing to write to >>> > >>> com.mongodb.hadoop.output.MongoRecordWriter@333ec758 >>> > >>> Failed to parse: <line 1, column 35> rule identifier failed >>> predicate: >>> > >>> {!input.LT(1).getText().equalsIgnoreCase("NULL")}? >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:79) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:93) >>> > >>> at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:175) >>> > >>> at >>> org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:166) >>> > >>> at >>> > >>> >>> > >>> com.mongodb.hadoop.pig.MongoStorage.prepareToWrite(MongoStorage.java:186) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86) >>> > >>> at >>> > >>> >>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) >>> > >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >>> > >>> at >>> > >>> >>> > >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >>> > >>> 2012-02-05 00:27:55,320 [Thread-15] INFO >>> > >>> com.mongodb.hadoop.pig.MongoStorage - Stored Schema: >>> [from:chararray, >>> > >>> to:chararray, pairs:{null:(subject:chararray)}] >>> > >>> 2012-02-05 00:27:55,323 [Thread-15] WARN >>> > >>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 >>> > >>> java.io.IOException: java.lang.NullPointerException >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261) >>> > >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >>> > >>> at >>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) >>> > >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >>> > >>> at >>> > >>> >>> > >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >>> > >>> Caused by: java.lang.NullPointerException >>> > >>> at >>> com.mongodb.hadoop.pig.MongoStorage.putNext(MongoStorage.java:68) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) >>> > >>> at >>> > >>> >>> > >>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508) >>> > >>> at >>> > >>> >>> > >>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) >>> > >>> at >>> > >>> >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462) >>> > >>> ... 7 more >>> > >>> >>> > >>> >>> > >>> -- >>> > >>> Russell Jurney >>> > >>> twitter.com/rjurney >>> > >>> [email protected] >>> > >>> datasyndrome.com >>> > >>> >>> > >> >>> > >> >>> > >> >>> > >> -- >>> > >> Russell Jurney >>> > >> twitter.com/rjurney >>> > >> [email protected] >>> > >> datasyndrome.com >>> > >> >>> > > >>> > > >>> > > >>> > > -- >>> > > Russell Jurney >>> > > twitter.com/rjurney >>> > > [email protected] >>> > > datasyndrome.com >>> > > >>> > >>> > >>> > >>> > -- >>> > Russell Jurney >>> > twitter.com/rjurney >>> > [email protected] >>> > datasyndrome.com >>> > >>> >> >> >> >> -- >> Russell Jurney >> twitter.com/rjurney >> [email protected] >> datasyndrome.com >> > > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
