Re: mllib sparse vector/matrix vs. graphx graph

2014-10-05 Thread Xiangrui Meng
It really depends on the type of the computation. For example, if
vertices and edges are associated with properties and you want to
operate on (vertex-edge-vertex) triplets or use the Pregel API, GraphX
is the way to go. -Xiangrui

On Sat, Oct 4, 2014 at 9:39 PM, ll  wrote:
> hi.  i am working on an algorithm that has a graph data structure.
>
> it looks like there 2 ways to implement this with spark
>
> option 1:  use graphx which already provide Vetices and Edges to build out
> the graph pretty nicely.
>
> option 2:  use mllib sparse vector / matrix to build out the graph.  the
> reason i consider mllib because it looks like it's more stable than graphx.
>
> what are the pros and cons of these 2 options?
>
> when would you use one vs the other?
>
> any advice is much appreciated!
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/mllib-sparse-vector-matrix-vs-graphx-graph-tp15759.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



mllib sparse vector/matrix vs. graphx graph

2014-10-04 Thread ll
hi.  i am working on an algorithm that has a graph data structure.  

it looks like there 2 ways to implement this with spark

option 1:  use graphx which already provide Vetices and Edges to build out
the graph pretty nicely.

option 2:  use mllib sparse vector / matrix to build out the graph.  the
reason i consider mllib because it looks like it's more stable than graphx.

what are the pros and cons of these 2 options? 

when would you use one vs the other?

any advice is much appreciated!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mllib-sparse-vector-matrix-vs-graphx-graph-tp15759.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLib sparse vector

2014-09-15 Thread Chris Gore
Probably worth noting that the factory methods in mllib create an object of 
type org.apache.spark.mllib.linalg.Vector which stores data in a similar format 
as Breeze vectors

Chris

On Sep 15, 2014, at 3:24 PM, Xiangrui Meng  wrote:

> Or you can use the factory method `Vectors.sparse`:
> 
> val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))
> 
> where numProducts should be the largest product id plus one.
> 
> Best,
> Xiangrui
> 
> On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore  wrote:
>> Hi Sameer,
>> 
>> MLLib uses Breeze’s vector format under the hood.  You can use that.
>> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>> 
>> For example:
>> 
>> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>> 
>> val numClasses = classes.distinct.count.toInt
>> 
>> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
>> BSV[Double](x.classIDs.sortWith(_ < _),
>> Seq.fill(x.classIDs.length)(1.0).toArray,
>> numClasses).asInstanceOf[BV[Double]]))
>> 
>> Chris
>> 
>> On Sep 15, 2014, at 11:28 AM, Sameer Tilak  wrote:
>> 
>> Hi All,
>> I have transformed the data into following format: First column is user id,
>> and then all the other columns are class ids. For a user only class ids that
>> appear in this row have value 1 and others are 0.  I need to crease a sparse
>> vector from this. Does the API for creating a sparse vector that can
>> directly support this format?
>> 
>> User idProduct class ids
>> 
>> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
>> 183576 3286 51715 57671 57476
>> 
>> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLib sparse vector

2014-09-15 Thread Xiangrui Meng
Or you can use the factory method `Vectors.sparse`:

val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))

where numProducts should be the largest product id plus one.

Best,
Xiangrui

On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore  wrote:
> Hi Sameer,
>
> MLLib uses Breeze’s vector format under the hood.  You can use that.
> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>
> For example:
>
> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>
> val numClasses = classes.distinct.count.toInt
>
> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
> BSV[Double](x.classIDs.sortWith(_ < _),
> Seq.fill(x.classIDs.length)(1.0).toArray,
> numClasses).asInstanceOf[BV[Double]]))
>
> Chris
>
> On Sep 15, 2014, at 11:28 AM, Sameer Tilak  wrote:
>
> Hi All,
> I have transformed the data into following format: First column is user id,
> and then all the other columns are class ids. For a user only class ids that
> appear in this row have value 1 and others are 0.  I need to crease a sparse
> vector from this. Does the API for creating a sparse vector that can
> directly support this format?
>
> User idProduct class ids
>
> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
> 183576 3286 51715 57671 57476
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLib sparse vector

2014-09-15 Thread Chris Gore
Hi Sameer,

MLLib uses Breeze’s vector format under the hood.  You can use that.  
http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector

For example:

import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}

val numClasses = classes.distinct.count.toInt

val userWithClassesAsSparseVector = rows.map(x => (x.userID, new 
BSV[Double](x.classIDs.sortWith(_ < _), 
Seq.fill(x.classIDs.length)(1.0).toArray, numClasses).asInstanceOf[BV[Double]]))

Chris

On Sep 15, 2014, at 11:28 AM, Sameer Tilak  wrote:

> Hi All,
> I have transformed the data into following format: First column is user id, 
> and then all the other columns are class ids. For a user only class ids that 
> appear in this row have value 1 and others are 0.  I need to crease a sparse 
> vector from this. Does the API for creating a sparse vector that can directly 
> support this format?  
> 
> User idProduct class ids
> 
> 2622572   145447  162013421   28565   285556  293 455367261   
> 130 3646167118806   183576  328651715   57671   57476



MLLib sparse vector

2014-09-15 Thread Sameer Tilak
Hi All,I have transformed the data into following format: First column is user 
id, and then all the other columns are class ids. For a user only class ids 
that appear in this row have value 1 and others are 0.  I need to crease a 
sparse vector from this. Does the API for creating a sparse vector that can 
directly support this format?  
User idProduct class ids
2622572 145447  162013421   28565   285556  293 455367261   130 
3646167118806   183576  328651715   57671   57476