RE: MLLib: LIBSVM issue

Sameer Tilak Thu, 18 Sep 2014 10:25:21 -0700

Thanks, will try it out today.

Date: Wed, 17 Sep 2014 23:04:31 -0700
Subject: Re: MLLib: LIBSVM issue
From: debasish.da...@gmail.com
To: bya...@stanford.edu
CC: ssti...@live.com; user@spark.apache.org


We dump fairly big libsvm to compare against liblinear/libsvm...the following 
code dumps out libsvm format from SparseVector...








def toLibSvm(features: SparseVector): String = {

    val indices = features.indices.map(_ + 1)

    val values = features.values

    indices.zip(values).mkString(" ").replace(',', ':').replace("(", 
"").replace(")", "")

  }



On Wed, Sep 17, 2014 at 9:11 PM, Burak Yavuz <bya...@stanford.edu> wrote:
Hi,



The spacing between the inputs should be a single space, not a tab. I feel like 
your inputs have tabs between them instead of a single space. Therefore the 
parser

cannot parse the input.



Best,

Burak



----- Original Message -----

From: "Sameer Tilak" <ssti...@live.com>

To: user@spark.apache.org

Sent: Wednesday, September 17, 2014 7:25:10 PM

Subject: MLLib: LIBSVM issue



Hi All,We have a fairly large amount of sparse data. I was following the 
following instructions in the manual:

Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...

import org.apache.spark.mllib.regression.LabeledPointimport 
org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD

val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
"data/mllib/sample_libsvm_data.txt")

I believe that I have formatted my data as per the required Libsvm format. Here 
is a snippet of that:

1        122:1        1693:1        1771:1        1974:1        2334:1        
2378:1        2562:1 1        118:1        1389:1        1413:1        1454:1   
     1780:1        2562:1        5051:1        5417:1        5548:1        
5798:1        5862:1 0        150:1        214:1        468:1        1013:1     
   1078:1        1092:1        1117:1        1489:1        1546:1        1630:1 
       1635:1        1827:1        2024:1        2215:1        2478:1        
2761:1        5985:1        6115:1        6218:1 0        251:1        5578:1

However,When I use MLUtils.loadLibSVMFile(sc, "path-to-data-file")I get the 
following error messages in mt spark-shell. Can someone please point me in 
right direction.

java.lang.NumberFormatException: For input string: "150:1        214:1        
468:1        1013:1        1078:1        1092:1        1117:1        1489:1     
   1546:1        1630:1        1635:1        1827:1        2024:1        2215:1 
       2478:1        2761:1        5985:1        6115:1        6218:1"         
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)     
    at java.lang.Double.parseDouble(Double.java:540)         at 
scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)





---------------------------------------------------------------------

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

RE: MLLib: LIBSVM issue

Reply via email to