[jira] Resolved: (MAHOUT-486) Null Pointer Exception running DictionaryVectorizer with ngram=2 on Reuters dataset

Drew Farris (JIRA) Mon, 23 Aug 2010 07:13:44 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Drew Farris resolved MAHOUT-486.
--------------------------------

    Resolution: Fixed

Looks like this was due to an improper use of the Gram default constructor that 
happened as a part of the 0.20.2 refactoring work. Does it make any sense to 
mark things like this, which should only be used by the serialization mechanism 
inside hadoop, as @deprecated, so that people don't use them in regular code?

> Null Pointer Exception running DictionaryVectorizer with ngram=2 on Reuters 
> dataset
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-486
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-486
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: Robin Anil
>            Assignee: Drew Farris
>             Fix For: 0.4
>
>
> java.io.IOException: Spill failed
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:860)
>       at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.CollocMapper$1.apply(CollocMapper.java:127)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.CollocMapper$1.apply(CollocMapper.java:114)
>       at 
> org.apache.mahout.math.map.OpenObjectIntHashMap.forEachPair(OpenObjectIntHashMap.java:186)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.CollocMapper.map(CollocMapper.java:114)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.CollocMapper.map(CollocMapper.java:41)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.lang.NullPointerException
>       at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:86)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.Gram.write(Gram.java:181)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>       at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179)
>       at 
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
>       at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.CollocCombiner.reduce(CollocCombiner.java:40)
>       at 
> org.apache.mahout.utils.nlp.collocations.llr.CollocCombiner.reduce(CollocCombiner.java:25)
>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>       at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-486) Null Pointer Exception running DictionaryVectorizer with ngram=2 on Reuters dataset

Reply via email to