Re: MLLib: LIBSVM issue

2014-09-18 Thread Debasish Das
We dump fairly big libsvm to compare against liblinear/libsvm...the
following code dumps out libsvm format from SparseVector...

def toLibSvm(features: SparseVector): String = {

val indices = features.indices.map(_ + 1)

val values = features.values

indices.zip(values).mkString( ).replace(',', ':').replace((, 
).replace(), )

  }



On Wed, Sep 17, 2014 at 9:11 PM, Burak Yavuz bya...@stanford.edu wrote:

 Hi,

 The spacing between the inputs should be a single space, not a tab. I feel
 like your inputs have tabs between them instead of a single space.
 Therefore the parser
 cannot parse the input.

 Best,
 Burak

 - Original Message -
 From: Sameer Tilak ssti...@live.com
 To: user@spark.apache.org
 Sent: Wednesday, September 17, 2014 7:25:10 PM
 Subject: MLLib: LIBSVM issue

 Hi All,We have a fairly large amount of sparse data. I was following the
 following instructions in the manual:
 Sparse dataIt is very common in practice to have sparse training data.
 MLlib supports reading training examples stored in LIBSVM format, which is
 the default format used by LIBSVM and LIBLINEAR. It is a text format in
 which each line represents a labeled sparse feature vector using the
 following format:label index1:value1 index2:value2 ...
 import org.apache.spark.mllib.regression.LabeledPointimport
 org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
 val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,
 data/mllib/sample_libsvm_data.txt)
 I believe that I have formatted my data as per the required Libsvm format.
 Here is a snippet of that:
 1122:11693:11771:11974:12334:1
 2378:12562:1 1118:11389:11413:1
 1454:11780:12562:15051:15417:1
 5548:15798:15862:1 0150:1214:1
 468:11013:11078:11092:11117:1
 1489:11546:11630:11635:11827:1
 2024:12215:12478:12761:15985:1
 6115:16218:1 0251:15578:1
 However,When I use MLUtils.loadLibSVMFile(sc, path-to-data-file)I get
 the following error messages in mt spark-shell. Can someone please point me
 in right direction.
 java.lang.NumberFormatException: For input string: 150:1214:1
 468:11013:11078:11092:11117:1
 1489:11546:11630:11635:11827:1
 2024:12215:12478:12761:15985:1
 6115:16218:1 at
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
  at java.lang.Double.parseDouble(Double.java:540) at
 scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: MLLib: LIBSVM issue

2014-09-18 Thread Sameer Tilak
Thanks, Burak,Yes, tab was an issue and I was able to get it working after 
replacing that with space.

 Date: Wed, 17 Sep 2014 21:11:00 -0700
 From: bya...@stanford.edu
 To: ssti...@live.com
 CC: user@spark.apache.org
 Subject: Re: MLLib: LIBSVM issue
 
 Hi,
 
 The spacing between the inputs should be a single space, not a tab. I feel 
 like your inputs have tabs between them instead of a single space. Therefore 
 the parser
 cannot parse the input.
 
 Best,
 Burak
 
 - Original Message -
 From: Sameer Tilak ssti...@live.com
 To: user@spark.apache.org
 Sent: Wednesday, September 17, 2014 7:25:10 PM
 Subject: MLLib: LIBSVM issue
 
 Hi All,We have a fairly large amount of sparse data. I was following the 
 following instructions in the manual:
 Sparse dataIt is very common in practice to have sparse training data. MLlib 
 supports reading training examples stored in LIBSVM format, which is the 
 default format used by LIBSVM and LIBLINEAR. It is a text format in which 
 each line represents a labeled sparse feature vector using the following 
 format:label index1:value1 index2:value2 ...
 import org.apache.spark.mllib.regression.LabeledPointimport 
 org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
 val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
 data/mllib/sample_libsvm_data.txt)
 I believe that I have formatted my data as per the required Libsvm format. 
 Here is a snippet of that:
 1122:11693:11771:11974:12334:1
 2378:12562:1 1118:11389:11413:11454:1 
1780:12562:15051:15417:15548:1
 5798:15862:1 0150:1214:1468:11013:1   
  1078:11092:11117:11489:11546:1
 1630:11635:11827:12024:12215:12478:1  
   2761:15985:16115:16218:1 0251:1
 5578:1 
 However,When I use MLUtils.loadLibSVMFile(sc, path-to-data-file)I get the 
 following error messages in mt spark-shell. Can someone please point me in 
 right direction.
 java.lang.NumberFormatException: For input string: 150:1214:1
 468:11013:11078:11092:11117:11489:1   
  1546:11630:11635:11827:12024:1
 2215:12478:12761:15985:16115:16218:1 
 at 
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)  
at java.lang.Double.parseDouble(Double.java:540) at 
 scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)
  
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
  

RE: MLLib: LIBSVM issue

2014-09-18 Thread Sameer Tilak
Thanks, will try it out today.

Date: Wed, 17 Sep 2014 23:04:31 -0700
Subject: Re: MLLib: LIBSVM issue
From: debasish.da...@gmail.com
To: bya...@stanford.edu
CC: ssti...@live.com; user@spark.apache.org

We dump fairly big libsvm to compare against liblinear/libsvm...the following 
code dumps out libsvm format from SparseVector...








def toLibSvm(features: SparseVector): String = {

val indices = features.indices.map(_ + 1)

val values = features.values

indices.zip(values).mkString( ).replace(',', ':').replace((, 
).replace(), )

  }



On Wed, Sep 17, 2014 at 9:11 PM, Burak Yavuz bya...@stanford.edu wrote:
Hi,



The spacing between the inputs should be a single space, not a tab. I feel like 
your inputs have tabs between them instead of a single space. Therefore the 
parser

cannot parse the input.



Best,

Burak



- Original Message -

From: Sameer Tilak ssti...@live.com

To: user@spark.apache.org

Sent: Wednesday, September 17, 2014 7:25:10 PM

Subject: MLLib: LIBSVM issue



Hi All,We have a fairly large amount of sparse data. I was following the 
following instructions in the manual:

Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...

import org.apache.spark.mllib.regression.LabeledPointimport 
org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD

val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
data/mllib/sample_libsvm_data.txt)

I believe that I have formatted my data as per the required Libsvm format. Here 
is a snippet of that:

1122:11693:11771:11974:12334:1
2378:12562:1 1118:11389:11413:11454:1   
 1780:12562:15051:15417:15548:1
5798:15862:1 0150:1214:1468:11013:1 
   1078:11092:11117:11489:11546:11630:1 
   1635:11827:12024:12215:12478:1
2761:15985:16115:16218:1 0251:15578:1

However,When I use MLUtils.loadLibSVMFile(sc, path-to-data-file)I get the 
following error messages in mt spark-shell. Can someone please point me in 
right direction.

java.lang.NumberFormatException: For input string: 150:1214:1
468:11013:11078:11092:11117:11489:1 
   1546:11630:11635:11827:12024:12215:1 
   2478:12761:15985:16115:16218:1 
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) 
at java.lang.Double.parseDouble(Double.java:540) at 
scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)





-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org




  

MLLib: LIBSVM issue

2014-09-17 Thread Sameer Tilak
Hi All,We have a fairly large amount of sparse data. I was following the 
following instructions in the manual:
Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...
import org.apache.spark.mllib.regression.LabeledPointimport 
org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
data/mllib/sample_libsvm_data.txt)
I believe that I have formatted my data as per the required Libsvm format. Here 
is a snippet of that:
1122:11693:11771:11974:12334:1
2378:12562:1 1118:11389:11413:11454:1   
 1780:12562:15051:15417:15548:1
5798:15862:1 0150:1214:1468:11013:1 
   1078:11092:11117:11489:11546:11630:1 
   1635:11827:12024:12215:12478:1
2761:15985:16115:16218:1 0251:15578:1 
However,When I use MLUtils.loadLibSVMFile(sc, path-to-data-file)I get the 
following error messages in mt spark-shell. Can someone please point me in 
right direction.
java.lang.NumberFormatException: For input string: 150:1214:1
468:11013:11078:11092:11117:11489:1 
   1546:11630:11635:11827:12024:12215:1 
   2478:12761:15985:16115:16218:1 
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) 
at java.lang.Double.parseDouble(Double.java:540) at 
scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)  
 

Re: MLLib: LIBSVM issue

2014-09-17 Thread Burak Yavuz
Hi,

The spacing between the inputs should be a single space, not a tab. I feel like 
your inputs have tabs between them instead of a single space. Therefore the 
parser
cannot parse the input.

Best,
Burak

- Original Message -
From: Sameer Tilak ssti...@live.com
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 7:25:10 PM
Subject: MLLib: LIBSVM issue

Hi All,We have a fairly large amount of sparse data. I was following the 
following instructions in the manual:
Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...
import org.apache.spark.mllib.regression.LabeledPointimport 
org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
data/mllib/sample_libsvm_data.txt)
I believe that I have formatted my data as per the required Libsvm format. Here 
is a snippet of that:
1122:11693:11771:11974:12334:1
2378:12562:1 1118:11389:11413:11454:1   
 1780:12562:15051:15417:15548:1
5798:15862:1 0150:1214:1468:11013:1 
   1078:11092:11117:11489:11546:11630:1 
   1635:11827:12024:12215:12478:1
2761:15985:16115:16218:1 0251:15578:1 
However,When I use MLUtils.loadLibSVMFile(sc, path-to-data-file)I get the 
following error messages in mt spark-shell. Can someone please point me in 
right direction.
java.lang.NumberFormatException: For input string: 150:1214:1
468:11013:11078:11092:11117:11489:1 
   1546:11630:11635:11827:12024:12215:1 
   2478:12761:15985:16115:16218:1 
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) 
at java.lang.Double.parseDouble(Double.java:540) at 
scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)  
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org