Re: MLLib: LIBSVM issue

Debasish Das Wed, 17 Sep 2014 23:05:59 -0700

We dump fairly big libsvm to compare against liblinear/libsvm...the
following code dumps out libsvm format from SparseVector...


def toLibSvm(features: SparseVector): String = {

    val indices = features.indices.map(_ + 1)

    val values = features.values

    indices.zip(values).mkString(" ").replace(',', ':').replace("(", ""
).replace(")", "")

  }



On Wed, Sep 17, 2014 at 9:11 PM, Burak Yavuz <bya...@stanford.edu> wrote:

> Hi,
>
> The spacing between the inputs should be a single space, not a tab. I feel
> like your inputs have tabs between them instead of a single space.
> Therefore the parser
> cannot parse the input.
>
> Best,
> Burak
>
> ----- Original Message -----
> From: "Sameer Tilak" <ssti...@live.com>
> To: user@spark.apache.org
> Sent: Wednesday, September 17, 2014 7:25:10 PM
> Subject: MLLib: LIBSVM issue
>
> Hi All,We have a fairly large amount of sparse data. I was following the
> following instructions in the manual:
> Sparse dataIt is very common in practice to have sparse training data.
> MLlib supports reading training examples stored in LIBSVM format, which is
> the default format used by LIBSVM and LIBLINEAR. It is a text format in
> which each line represents a labeled sparse feature vector using the
> following format:label index1:value1 index2:value2 ...
> import org.apache.spark.mllib.regression.LabeledPointimport
> org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
> val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,
> "data/mllib/sample_libsvm_data.txt")
> I believe that I have formatted my data as per the required Libsvm format.
> Here is a snippet of that:
> 1        122:1        1693:1        1771:1        1974:1        2334:1
>     2378:1        2562:1 1        118:1        1389:1        1413:1
> 1454:1        1780:1        2562:1        5051:1        5417:1
> 5548:1        5798:1        5862:1 0        150:1        214:1
> 468:1        1013:1        1078:1        1092:1        1117:1
> 1489:1        1546:1        1630:1        1635:1        1827:1
> 2024:1        2215:1        2478:1        2761:1        5985:1
> 6115:1        6218:1 0        251:1        5578:1
> However,When I use MLUtils.loadLibSVMFile(sc, "path-to-data-file")I get
> the following error messages in mt spark-shell. Can someone please point me
> in right direction.
> java.lang.NumberFormatException: For input string: "150:1        214:1
>     468:1        1013:1        1078:1        1092:1        1117:1
> 1489:1        1546:1        1630:1        1635:1        1827:1
> 2024:1        2215:1        2478:1        2761:1        5985:1
> 6115:1        6218:1"         at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>      at java.lang.Double.parseDouble(Double.java:540)         at
> scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: MLLib: LIBSVM issue

Reply via email to