Hi All,We have a fairly large amount of sparse data. I was following the 
following instructions in the manual:
Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...
import org.apache.spark.mllib.regression.LabeledPointimport 
org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
"data/mllib/sample_libsvm_data.txt")
I believe that I have formatted my data as per the required Libsvm format. Here 
is a snippet of that:
1        122:1        1693:1        1771:1        1974:1        2334:1        
2378:1        2562:1 1        118:1        1389:1        1413:1        1454:1   
     1780:1        2562:1        5051:1        5417:1        5548:1        
5798:1        5862:1 0        150:1        214:1        468:1        1013:1     
   1078:1        1092:1        1117:1        1489:1        1546:1        1630:1 
       1635:1        1827:1        2024:1        2215:1        2478:1        
2761:1        5985:1        6115:1        6218:1 0        251:1        5578:1 
However,When I use MLUtils.loadLibSVMFile(sc, "path-to-data-file")I get the 
following error messages in mt spark-shell. Can someone please point me in 
right direction.
java.lang.NumberFormatException: For input string: "150:1        214:1        
468:1        1013:1        1078:1        1092:1        1117:1        1489:1     
   1546:1        1630:1        1635:1        1827:1        2024:1        2215:1 
       2478:1        2761:1        5985:1        6115:1        6218:1"         
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)     
    at java.lang.Double.parseDouble(Double.java:540)         at 
scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)      
                                     

Reply via email to