[
https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi updated MAHOUT-1285:
----------------------------------
Status: Patch Available (was: Open)
A simple fix would be to check if the input String is of NumericFormat before
parsing it.
See attached patch, its not been tested.
> Arff loader can misparse string data as double
> ----------------------------------------------
>
> Key: MAHOUT-1285
> URL: https://issues.apache.org/jira/browse/MAHOUT-1285
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Linux Ubuntu 12.4
> Reporter: Neil Walkinshaw
> Fix For: Backlog
>
> Attachments: tempArff
>
>
> Have successfully loaded numerous ARFF files with Mahout (originally
> generated via WEKA). The files contain randomly generated data. For a
> specific random seed, the following exception is thrown:
> java.lang.NumberFormatException: For input string:
> "b1shkt70694difsmmmdv0ikmoh"
> at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
> at java.lang.Double.parseDouble(Double.java:540)
> at
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146)
> at
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97)
> at
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77)
> at
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
> at
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251)
> at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145)
> at
> libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38)
> at
> libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42)
> at tests.InputTester.testMahoutMeansShift(InputTester.java:111)
--
This message was sent by Atlassian JIRA
(v6.1#6144)