Re: MongoStorage broken

Russell Jurney Mon, 06 Feb 2012 14:05:24 -0800

Created https://jira.mongodb.org/browse/HADOOP-19 and
https://issues.apache.org/jira/browse/PIG-2509


On Mon, Feb 6, 2012 at 10:13 AM, Jonathan Coveney <[email protected]>wrote:

> I think this specific issue is worth filing a new bug for, since there
> wasn't a specific ticket on my radar that covered the issue of
> getSchemaFromString not working properly in this use case.
>
>
> 2012/2/5 Russell Jurney <[email protected]>
>
>> Do I file this, or is it a dupe?  I saw lots of existing tickets that
>> look similar.
>>
>>
>> On Sun, Feb 5, 2012 at 1:53 PM, Dmitriy Ryaboy <[email protected]>wrote:
>>
>>> That tuple name has been named optional but I guess some places still
>>> assume it exists.
>>> + jon.
>>>
>>> On Sun, Feb 5, 2012 at 1:16 AM, Russell Jurney <[email protected]
>>> >wrote:
>>>
>>> > This now seems like a bug in Utils.getSchemaFromString
>>> >
>>> > On Sun, Feb 5, 2012 at 1:02 AM, Russell Jurney <
>>> [email protected]
>>> > >wrote:
>>> >
>>> > > To answer my own question, this is because the schemas differ.  The
>>> > schema
>>> > > in the working case has a named tuple via AvroStorage.  Storing to
>>> Mongo
>>> > > works when I name the tuple:
>>> > >
>>> > > ...
>>> > > sent_topics = FOREACH froms GENERATE FLATTEN(group) AS (from, to),
>>> > > pairs.subject AS pairs:bag {column:tuple (subject:chararray)};
>>> > >
>>> > > STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING
>>> > > MongoStorage();
>>> > >
>>> > >
>>> > > I will stop cross-posting to myself now.
>>> > >
>>> > >
>>> > > On Sun, Feb 5, 2012 at 12:47 AM, Russell Jurney <
>>> > [email protected]>wrote:
>>> > >
>>> > >> sent_topics = LOAD '/tmp/pair_titles.avro' USING AvroStorage();
>>> > >> STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING
>>> > >> MongoStorage();
>>> > >>
>>> > >> That works.  Why is it the case that MongoStorage only works if the
>>> > >> intermediate processing doesn't happen?  Strangeness.
>>> > >>
>>> > >> On Sun, Feb 5, 2012 at 12:31 AM, Russell Jurney <
>>> > [email protected]
>>> > >> > wrote:
>>> > >>
>>> > >>> MongoStorage is failing for me now, on a script that was failing
>>> > before.
>>> > >>>  Is anyone else using it? The schema is [from:chararray,
>>> to:chararray,
>>> > >>> pairs:{null:(subject:chararray)}], which worked before.
>>> > >>>
>>> > >>> 2012-02-05 00:27:54,991 [Thread-15] INFO
>>> > >>>  com.mongodb.hadoop.pig.MongoStorage - Store Location Config:
>>> > >>> Configuration: core-default.xml, core-site.xml, mapred-default.xml,
>>> > >>> mapred-site.xml,
>>> > >>> /tmp/hadoop-rjurney/mapred/local/localRunner/job_local_0001.xml For
>>> > URI:
>>> > >>> mongodb://localhost/test.pigola
>>> > >>> 2012-02-05 00:27:54,993 [Thread-15] INFO
>>> > >>>  com.mongodb.hadoop.pig.MongoStorage - OutputFormat...
>>> > >>> com.mongodb.hadoop.MongoOutputFormat@4eb7cd92
>>> > >>> 2012-02-05 00:27:55,291 [Thread-15] INFO
>>> > >>>  com.mongodb.hadoop.pig.MongoStorage - Preparing to write to
>>> > >>> com.mongodb.hadoop.output.MongoRecordWriter@333ec758
>>> > >>> Failed to parse: <line 1, column 35>  rule identifier failed
>>> predicate:
>>> > >>> {!input.LT(1).getText().equalsIgnoreCase("NULL")}?
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:79)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:93)
>>> > >>> at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:175)
>>> > >>>  at
>>> org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:166)
>>> > >>> at
>>> > >>>
>>> >
>>> com.mongodb.hadoop.pig.MongoStorage.prepareToWrite(MongoStorage.java:186)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125)
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86)
>>> > >>>  at
>>> > >>>
>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
>>> > >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>>> > >>> 2012-02-05 00:27:55,320 [Thread-15] INFO
>>> > >>>  com.mongodb.hadoop.pig.MongoStorage - Stored Schema:
>>> [from:chararray,
>>> > >>> to:chararray, pairs:{null:(subject:chararray)}]
>>> > >>> 2012-02-05 00:27:55,323 [Thread-15] WARN
>>> > >>>  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>>> > >>> java.io.IOException: java.lang.NullPointerException
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>>> > >>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>> > >>> at
>>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>> > >>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>>> > >>> Caused by: java.lang.NullPointerException
>>> > >>> at
>>> com.mongodb.hadoop.pig.MongoStorage.putNext(MongoStorage.java:68)
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508)
>>> > >>>  at
>>> > >>>
>>> >
>>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>> > >>> at
>>> > >>>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462)
>>> > >>>  ... 7 more
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Russell Jurney
>>> > >>> twitter.com/rjurney
>>> > >>> [email protected]
>>> > >>> datasyndrome.com
>>> > >>>
>>> > >>
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Russell Jurney
>>> > >> twitter.com/rjurney
>>> > >> [email protected]
>>> > >> datasyndrome.com
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Russell Jurney
>>> > > twitter.com/rjurney
>>> > > [email protected]
>>> > > datasyndrome.com
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Russell Jurney
>>> > twitter.com/rjurney
>>> > [email protected]
>>> > datasyndrome.com
>>> >
>>>
>>
>>
>>
>> --
>> Russell Jurney
>> twitter.com/rjurney
>> [email protected]
>> datasyndrome.com
>>
>
>


-- 
Russell Jurney
twitter.com/rjurney
[email protected]
datasyndrome.com

Re: MongoStorage broken

Reply via email to