MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
-------------------------------------------------------------------------
Key: MAHOUT-985
URL: https://issues.apache.org/jira/browse/MAHOUT-985
Project: Mahout
Issue Type: Bug
Components: Integration
Affects Versions: 0.5
Reporter: Dave Kor
Priority: Minor
When parsing an Arff file that contain instance-specific weights,
MapBackedArffModel throws the following NPE exception. While I have only tested
this in 0.5, I suspect this bug also occur in 0.6
Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
at
org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
at
org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
at
org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
The code works properly when all instance weights are set to the default value
of 1. However when any instance has a non-default weight, such as in the sample
Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line
8.
-----
@relation 'Test Mahout'
@attribute Attr0 numeric
@attribute Label {True,False}
@data
0,False
1,True,{2}
-----
The reason is that in Weka, all data instances are assumed to have a default
weight of 1 and this default weight is not saved in the Arff file. However when
a data instance DOES NOT have the default weight of 1, the non-default instance
weight is appended at the end of the line surrounded by curly braces. When
MapBackedArffModel.getValue method tries to parse this weight as an attribute,
typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which
results in an NPE.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira