MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
-------------------------------------------------------------------------

                 Key: MAHOUT-985
                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
             Project: Mahout
          Issue Type: Bug
          Components: Integration
    Affects Versions: 0.5
            Reporter: Dave Kor
            Priority: Minor


When parsing an Arff file that contain instance-specific weights, 
MapBackedArffModel throws the following NPE exception. While I have only tested 
this in 0.5, I suspect this bug also occur in 0.6

Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
        at 
org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
        at 
org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at 
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
        at 
org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
        at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)

The code works properly when all instance weights are set to the default value 
of 1. However when any instance has a non-default weight, such as in the sample 
Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 
8. 

-----
@relation 'Test Mahout'

@attribute Attr0 numeric
@attribute Label {True,False}

@data
0,False
1,True,{2}
-----

The reason is that in Weka, all data instances are assumed to have a default 
weight of 1 and this default weight is not saved in the Arff file. However when 
a data instance DOES NOT have the default weight of 1, the non-default instance 
weight is appended at the end of the line surrounded by curly braces. When 
MapBackedArffModel.getValue method tries to parse this weight as an attribute, 
typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which 
results in an NPE. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to