Thanks, will try it out today. Date: Wed, 17 Sep 2014 23:04:31 -0700 Subject: Re: MLLib: LIBSVM issue From: debasish.da...@gmail.com To: bya...@stanford.edu CC: ssti...@live.com; user@spark.apache.org
We dump fairly big libsvm to compare against liblinear/libsvm...the following code dumps out libsvm format from SparseVector... def toLibSvm(features: SparseVector): String = { val indices = features.indices.map(_ + 1) val values = features.values indices.zip(values).mkString(" ").replace(',', ':').replace("(", "").replace(")", "") } On Wed, Sep 17, 2014 at 9:11 PM, Burak Yavuz <bya...@stanford.edu> wrote: Hi, The spacing between the inputs should be a single space, not a tab. I feel like your inputs have tabs between them instead of a single space. Therefore the parser cannot parse the input. Best, Burak ----- Original Message ----- From: "Sameer Tilak" <ssti...@live.com> To: user@spark.apache.org Sent: Wednesday, September 17, 2014 7:25:10 PM Subject: MLLib: LIBSVM issue Hi All,We have a fairly large amount of sparse data. I was following the following instructions in the manual: Sparse dataIt is very common in practice to have sparse training data. MLlib supports reading training examples stored in LIBSVM format, which is the default format used by LIBSVM and LIBLINEAR. It is a text format in which each line represents a labeled sparse feature vector using the following format:label index1:value1 index2:value2 ... import org.apache.spark.mllib.regression.LabeledPointimport org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt") I believe that I have formatted my data as per the required Libsvm format. Here is a snippet of that: 1 122:1 1693:1 1771:1 1974:1 2334:1 2378:1 2562:1 1 118:1 1389:1 1413:1 1454:1 1780:1 2562:1 5051:1 5417:1 5548:1 5798:1 5862:1 0 150:1 214:1 468:1 1013:1 1078:1 1092:1 1117:1 1489:1 1546:1 1630:1 1635:1 1827:1 2024:1 2215:1 2478:1 2761:1 5985:1 6115:1 6218:1 0 251:1 5578:1 However,When I use MLUtils.loadLibSVMFile(sc, "path-to-data-file")I get the following error messages in mt spark-shell. Can someone please point me in right direction. java.lang.NumberFormatException: For input string: "150:1 214:1 468:1 1013:1 1078:1 1092:1 1117:1 1489:1 1546:1 1630:1 1635:1 1827:1 2024:1 2215:1 2478:1 2761:1 5985:1 6115:1 6218:1" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org