Re: mllib sparse vector/matrix vs. graphx graph
It really depends on the type of the computation. For example, if vertices and edges are associated with properties and you want to operate on (vertex-edge-vertex) triplets or use the Pregel API, GraphX is the way to go. -Xiangrui On Sat, Oct 4, 2014 at 9:39 PM, ll wrote: > hi. i am working on an algorithm that has a graph data structure. > > it looks like there 2 ways to implement this with spark > > option 1: use graphx which already provide Vetices and Edges to build out > the graph pretty nicely. > > option 2: use mllib sparse vector / matrix to build out the graph. the > reason i consider mllib because it looks like it's more stable than graphx. > > what are the pros and cons of these 2 options? > > when would you use one vs the other? > > any advice is much appreciated! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mllib-sparse-vector-matrix-vs-graphx-graph-tp15759.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
mllib sparse vector/matrix vs. graphx graph
hi. i am working on an algorithm that has a graph data structure. it looks like there 2 ways to implement this with spark option 1: use graphx which already provide Vetices and Edges to build out the graph pretty nicely. option 2: use mllib sparse vector / matrix to build out the graph. the reason i consider mllib because it looks like it's more stable than graphx. what are the pros and cons of these 2 options? when would you use one vs the other? any advice is much appreciated! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mllib-sparse-vector-matrix-vs-graphx-graph-tp15759.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLLib sparse vector
Probably worth noting that the factory methods in mllib create an object of type org.apache.spark.mllib.linalg.Vector which stores data in a similar format as Breeze vectors Chris On Sep 15, 2014, at 3:24 PM, Xiangrui Meng wrote: > Or you can use the factory method `Vectors.sparse`: > > val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0))) > > where numProducts should be the largest product id plus one. > > Best, > Xiangrui > > On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore wrote: >> Hi Sameer, >> >> MLLib uses Breeze’s vector format under the hood. You can use that. >> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector >> >> For example: >> >> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} >> >> val numClasses = classes.distinct.count.toInt >> >> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new >> BSV[Double](x.classIDs.sortWith(_ < _), >> Seq.fill(x.classIDs.length)(1.0).toArray, >> numClasses).asInstanceOf[BV[Double]])) >> >> Chris >> >> On Sep 15, 2014, at 11:28 AM, Sameer Tilak wrote: >> >> Hi All, >> I have transformed the data into following format: First column is user id, >> and then all the other columns are class ids. For a user only class ids that >> appear in this row have value 1 and others are 0. I need to crease a sparse >> vector from this. Does the API for creating a sparse vector that can >> directly support this format? >> >> User idProduct class ids >> >> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806 >> 183576 3286 51715 57671 57476 >> >> - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLLib sparse vector
Or you can use the factory method `Vectors.sparse`: val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0))) where numProducts should be the largest product id plus one. Best, Xiangrui On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore wrote: > Hi Sameer, > > MLLib uses Breeze’s vector format under the hood. You can use that. > http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector > > For example: > > import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} > > val numClasses = classes.distinct.count.toInt > > val userWithClassesAsSparseVector = rows.map(x => (x.userID, new > BSV[Double](x.classIDs.sortWith(_ < _), > Seq.fill(x.classIDs.length)(1.0).toArray, > numClasses).asInstanceOf[BV[Double]])) > > Chris > > On Sep 15, 2014, at 11:28 AM, Sameer Tilak wrote: > > Hi All, > I have transformed the data into following format: First column is user id, > and then all the other columns are class ids. For a user only class ids that > appear in this row have value 1 and others are 0. I need to crease a sparse > vector from this. Does the API for creating a sparse vector that can > directly support this format? > > User idProduct class ids > > 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806 > 183576 3286 51715 57671 57476 > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLLib sparse vector
Hi Sameer, MLLib uses Breeze’s vector format under the hood. You can use that. http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector For example: import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} val numClasses = classes.distinct.count.toInt val userWithClassesAsSparseVector = rows.map(x => (x.userID, new BSV[Double](x.classIDs.sortWith(_ < _), Seq.fill(x.classIDs.length)(1.0).toArray, numClasses).asInstanceOf[BV[Double]])) Chris On Sep 15, 2014, at 11:28 AM, Sameer Tilak wrote: > Hi All, > I have transformed the data into following format: First column is user id, > and then all the other columns are class ids. For a user only class ids that > appear in this row have value 1 and others are 0. I need to crease a sparse > vector from this. Does the API for creating a sparse vector that can directly > support this format? > > User idProduct class ids > > 2622572 145447 162013421 28565 285556 293 455367261 > 130 3646167118806 183576 328651715 57671 57476
MLLib sparse vector
Hi All,I have transformed the data into following format: First column is user id, and then all the other columns are class ids. For a user only class ids that appear in this row have value 1 and others are 0. I need to crease a sparse vector from this. Does the API for creating a sparse vector that can directly support this format? User idProduct class ids 2622572 145447 162013421 28565 285556 293 455367261 130 3646167118806 183576 328651715 57671 57476