How to do spares vector product in Spark?

2015-03-13 Thread Xi Shen
Hi,

I have two RDD[Vector], both Vector are spares and of the form:

(id, value)

id indicates the position of the value in the vector space. I want to
apply dot product on two of such RDD[Vector] and get a scale value. The
none exist values are treated as zero.

Any convenient tool to do this in Spark?


Thanks,
David


RE: How to do spares vector product in Spark?

2015-03-13 Thread Daniel, Ronald (ELS-SDG)
 Any convenient tool to do this [sparse vector product] in Spark?

Unfortunately, it seems that there are very few operations defined for sparse 
vectors. I needed to add some, and ended up converting them to (dense) numpy 
vectors and doing the addition on those.

Best regards,
Ron


From: Xi Shen [mailto:davidshe...@gmail.com]
Sent: Friday, March 13, 2015 1:50 AM
To: user@spark.apache.org
Subject: How to do spares vector product in Spark?

Hi,

I have two RDD[Vector], both Vector are spares and of the form:

(id, value)

id indicates the position of the value in the vector space. I want to apply 
dot product on two of such RDD[Vector] and get a scale value. The none exist 
values are treated as zero.

Any convenient tool to do this in Spark?


Thanks,
David


Re: How to do spares vector product in Spark?

2015-03-13 Thread Sean Owen
In Java/Scala-land, the intent is to use Breeze for this. Vector in
Spark is an opaque wrapper around the Breeze representation, which
contains a bunch of methods like this.

On Fri, Mar 13, 2015 at 3:28 PM, Daniel, Ronald (ELS-SDG)
r.dan...@elsevier.com wrote:
 Any convenient tool to do this [sparse vector product] in Spark?



 Unfortunately, it seems that there are very few operations defined for
 sparse vectors. I needed to add some, and ended up converting them to
 (dense) numpy vectors and doing the addition on those.



 Best regards,

 Ron





 From: Xi Shen [mailto:davidshe...@gmail.com]
 Sent: Friday, March 13, 2015 1:50 AM
 To: user@spark.apache.org
 Subject: How to do spares vector product in Spark?



 Hi,



 I have two RDD[Vector], both Vector are spares and of the form:



 (id, value)



 id indicates the position of the value in the vector space. I want to
 apply dot product on two of such RDD[Vector] and get a scale value. The none
 exist values are treated as zero.



 Any convenient tool to do this in Spark?





 Thanks,

 David

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org