Re: mLIb solving linear regression with sparse inputs

Robineast Sun, 06 Nov 2016 03:36:08 -0800

Here’s a way of creating sparse vectors in MLLib:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.rdd.RDD


val rdd = sc.textFile("A.txt").map(line => line.split(",")).
     map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))

val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))

val create = (first: (Int, Int, Double)) => (Array(first._2), Array(first._3))
val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int, Double)) => 
(head._1 :+ tail._2, head._2 :+ tail._3)
val merge = (a: (Array[Int], Array[Double]), b: (Array[Int], Array[Double])) => 
(a._1 ++ b._1, a._2 ++ b._2)

val A = pairRdd.combineByKey(create,combine,merge).map(el => 
Vectors.sparse(3,el._2._1,el._2._2))

If you have a separate file of b’s then you would need to manipulate this 
slightly to join the b’s to the A RDD and then create LabeledPoints. I guess 
there is a way of doing this using the newer ML interfaces but it’s not 
particularly obvious to me how.

One point: In the example you give the b’s are exactly the same as col 2 in the 
A matrix. I presume this is just a quick hacked together example because that 
would give a trivial result.

-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 
<http://www.manning.com/books/spark-graphx-in-action>





> On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] 
> <ml-node+s1001560n28008...@n3.nabble.com> wrote:
> 
> I would like to use it. But how do I do the following 
> 1) Read sparse data (from text or database) 
> 2) pass the sparse data to the linearRegression class? 
> 
> For example: 
> 
> Sparse matrix A 
> row, column, value 
> 0,0,.42 
> 0,1,.28 
> 0,2,.89 
> 1,0,.83 
> 1,1,.34 
> 1,2,.42 
> 2,0,.23 
> 3,0,.42 
> 3,1,.98 
> 3,2,.88 
> 4,0,.23 
> 4,1,.36 
> 4,2,.97 
> 
> Sparse vector b 
> row, column, value 
> 0,2,.89 
> 1,2,.42 
> 3,2,.88 
> 4,2,.97 
> 
> Solve Ax = b??? 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html
>  
> <http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html>
> To start a new topic under Apache Spark User List, email 
> ml-node+s1001560n1...@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Um9iaW4uZWFzdEB4ZW5zZS5jby51a3wxfDIzMzQzMDUyNg==>.
> NAML 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




-----
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: mLIb solving linear regression with sparse inputs

Reply via email to