MLLib sparse vector

2014-09-15 Thread Sameer Tilak
Hi All,I have transformed the data into following format: First column is user 
id, and then all the other columns are class ids. For a user only class ids 
that appear in this row have value 1 and others are 0.  I need to crease a 
sparse vector from this. Does the API for creating a sparse vector that can 
directly support this format?  
User idProduct class ids
2622572 145447  162013421   28565   285556  293 455367261   130 
3646167118806   183576  328651715   57671   57476   
  

Re: MLLib sparse vector

2014-09-15 Thread Chris Gore
Hi Sameer,

MLLib uses Breeze’s vector format under the hood.  You can use that.  
http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector

For example:

import breeze.linalg.{DenseVector = BDV, SparseVector = BSV, Vector = BV}

val numClasses = classes.distinct.count.toInt

val userWithClassesAsSparseVector = rows.map(x = (x.userID, new 
BSV[Double](x.classIDs.sortWith(_  _), 
Seq.fill(x.classIDs.length)(1.0).toArray, numClasses).asInstanceOf[BV[Double]]))

Chris

On Sep 15, 2014, at 11:28 AM, Sameer Tilak ssti...@live.com wrote:

 Hi All,
 I have transformed the data into following format: First column is user id, 
 and then all the other columns are class ids. For a user only class ids that 
 appear in this row have value 1 and others are 0.  I need to crease a sparse 
 vector from this. Does the API for creating a sparse vector that can directly 
 support this format?  
 
 User idProduct class ids
 
 2622572   145447  162013421   28565   285556  293 455367261   
 130 3646167118806   183576  328651715   57671   57476



Re: MLLib sparse vector

2014-09-15 Thread Xiangrui Meng
Or you can use the factory method `Vectors.sparse`:

val sv = Vectors.sparse(numProducts, productIds.map(x = (x, 1.0)))

where numProducts should be the largest product id plus one.

Best,
Xiangrui

On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore cdg...@cdgore.com wrote:
 Hi Sameer,

 MLLib uses Breeze’s vector format under the hood.  You can use that.
 http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector

 For example:

 import breeze.linalg.{DenseVector = BDV, SparseVector = BSV, Vector = BV}

 val numClasses = classes.distinct.count.toInt

 val userWithClassesAsSparseVector = rows.map(x = (x.userID, new
 BSV[Double](x.classIDs.sortWith(_  _),
 Seq.fill(x.classIDs.length)(1.0).toArray,
 numClasses).asInstanceOf[BV[Double]]))

 Chris

 On Sep 15, 2014, at 11:28 AM, Sameer Tilak ssti...@live.com wrote:

 Hi All,
 I have transformed the data into following format: First column is user id,
 and then all the other columns are class ids. For a user only class ids that
 appear in this row have value 1 and others are 0.  I need to crease a sparse
 vector from this. Does the API for creating a sparse vector that can
 directly support this format?

 User idProduct class ids

 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
 183576 3286 51715 57671 57476



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLib sparse vector

2014-09-15 Thread Chris Gore
Probably worth noting that the factory methods in mllib create an object of 
type org.apache.spark.mllib.linalg.Vector which stores data in a similar format 
as Breeze vectors

Chris

On Sep 15, 2014, at 3:24 PM, Xiangrui Meng men...@gmail.com wrote:

 Or you can use the factory method `Vectors.sparse`:
 
 val sv = Vectors.sparse(numProducts, productIds.map(x = (x, 1.0)))
 
 where numProducts should be the largest product id plus one.
 
 Best,
 Xiangrui
 
 On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore cdg...@cdgore.com wrote:
 Hi Sameer,
 
 MLLib uses Breeze’s vector format under the hood.  You can use that.
 http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
 
 For example:
 
 import breeze.linalg.{DenseVector = BDV, SparseVector = BSV, Vector = BV}
 
 val numClasses = classes.distinct.count.toInt
 
 val userWithClassesAsSparseVector = rows.map(x = (x.userID, new
 BSV[Double](x.classIDs.sortWith(_  _),
 Seq.fill(x.classIDs.length)(1.0).toArray,
 numClasses).asInstanceOf[BV[Double]]))
 
 Chris
 
 On Sep 15, 2014, at 11:28 AM, Sameer Tilak ssti...@live.com wrote:
 
 Hi All,
 I have transformed the data into following format: First column is user id,
 and then all the other columns are class ids. For a user only class ids that
 appear in this row have value 1 and others are 0.  I need to crease a sparse
 vector from this. Does the API for creating a sparse vector that can
 directly support this format?
 
 User idProduct class ids
 
 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
 183576 3286 51715 57671 57476
 
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org