> What about declaring a few simple implicit conversions between the > MLlib and Breeze Vector classes? if you import them then you should be > able to write a lot of the source code just as you imagine it, as if > the Breeze methods were available on the Vector object in MLlib.
The problem is that *I don't know how* to write those implicit defs in Scala in a good way, and that's why I'm asking the user list for a better solution. (see below for another hack) My understanding is that I can define a new class that would extend Vector and have the implicit def conversion (as in the Scala manual, see below). Since I got burned by memory issues when using my own classes in this very way (what's the overhead of creating a new class every time I want to add two Vectors? I don't know - I'm a lowly data scientist), I'm scared to do it by myself. Since you might have many Spark users with my background (some programming, but not expert) - making everyone implement their own "addVector" function might cause many hours of frustration that might be so much better spent on coding. Adding +,- and scalar * can be done by a Spark contributor in under one hour (under what I spent just writing these emails), while it would take me a day (and multiply this by so many users like me), compounded by uncertainty of how to proceed - do I use ml instead of mllib because columns of a dataframe can be added while mllib can't? do I use breeze? do i use apache.commons? do I write my own (how long will it take me)? do I abandon Scala and go with pyspark because I don't have such problems in numpy? The slippery slope exists, but if you implement p-norm of a vector and sqdist between two vectors, you should also implement simpler operations too. There is a clear difference between functionality for adding two vectors and taking a determinant, for example. If I remember correctly, +,-,*,/ were implemented in a previous version of Spark in a now deprecated class, now expunged from the documentation. Many thanks, Kristina PS: is this what you meant by adding simple implicit def? should it be a class or object? These are kinds of questions I grapple with and why I'm asking for example of a solution // this is really a pseudo-code, I know BreezeVector and SparkVector are not real class names class MyVector extends SparkVector { implicit def toBreeze(v:SparkVector):BreezeVector = BreezeVector(v.toArray) implicit def fromBreeze( bv:BreezeVector ):SparkVector = Vectors.dense( bv.toArray ) } On Tue, Aug 25, 2015 at 11:11 AM, Sean Owen <so...@cloudera.com> wrote: > Yes, you're right that it's quite on purpose to leave this API to > Breeze, in the main. As you can see the Spark objects have already > sprouted a few basic operations anyway; there's a slippery slope > problem here. Why not addition, why not dot products, why not > determinants, etc. > > What about declaring a few simple implicit conversions between the > MLlib and Breeze Vector classes? if you import them then you should be > able to write a lot of the source code just as you imagine it, as if > the Breeze methods were available on the Vector object in MLlib. > > On Tue, Aug 25, 2015 at 3:35 PM, Kristina Rogale Plazonic > <kpl...@gmail.com> wrote: > > Well, yes, the hack below works (that's all I have time for), but is not > > satisfactory - it is not safe, and is verbose and very cumbersome to use, > > does not separately deal with SparseVector case and is not complete > either. > > > > My question is, out of hundreds of users on this list, someone must have > > come up with a better solution - please? > > > > > > import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => > BV} > > import org.apache.spark.mllib.linalg.Vectors > > import org.apache.spark.mllib.linalg.{Vector =>SparkVector} > > > > def toBreeze(v:SparkVector) = BV(v.toArray) > > > > def fromBreeze(bv:BV[Double]) = Vectors.dense(bv.toArray) > > > > def add(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) + > > toBreeze(v2)) > > > > def subtract(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) - > > toBreeze(v2)) > > > > def scalarMultiply(a:Double, v:SparkVector) = fromBreeze( a*toBreeze(v1) > ) > > > > > > On Tue, Aug 25, 2015 at 9:41 AM, Sonal Goyal <sonalgoy...@gmail.com> > wrote: > >> > >> From what I have understood, you probably need to convert your vector to > >> breeze and do your operations there. Check > >> > stackoverflow.com/questions/28232829/addition-of-two-rddmllib-linalg-vectors > >> > >> On Aug 25, 2015 7:06 PM, "Kristina Rogale Plazonic" <kpl...@gmail.com> > >> wrote: > >>> > >>> Hi all, > >>> > >>> I'm still not clear what is the best (or, ANY) way to add/subtract two > >>> org.apache.spark.mllib.Vector objects in Scala. > >>> > >>> Ok, I understand there was a conscious Spark decision not to support > >>> linear algebra operations in Scala and leave it to the user to choose a > >>> linear algebra library. > >>> > >>> But, for any newcomer from R or Python, where you don't think twice > about > >>> adding two vectors, it is such a productivity shot in the foot to have > to > >>> write your own + operation. I mean, there is support in Spark for > p-norm of > >>> Vectors, for sqdist between two Vectors, but not for +/-? As I said, > I'm a > >>> newcomer to linear algebra in Scala and am not familiar with Breeze or > >>> apache.commons - I am willing to learn, but would really benefit from > >>> guidance from more experienced users. I am also not used to optimizing > >>> low-level code and am sure that any hack I do will be just horrible. > >>> > >>> So, please, could somebody point me to a blog post, documentation, or > >>> just patches for this really basic functionality. What do you do to get > >>> around it? Am I the only one to have a problem? (And, would it really > be so > >>> onerous to add +/- to Spark? After all, even > org.apache.spark.sql.Column > >>> class does have +,-,*,/ ) > >>> > >>> My stupid little use case is to generate some toy data for Kmeans, and > I > >>> need to translate a Gaussian blob to another center (for streaming and > >>> nonstreaming KMeans both). > >>> > >>> Many thanks! (I am REALLY embarassed to ask such a simple question...) > >>> > >>> Kristina > > > > >