Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?
I've added support for sparse vectors and created HadamardTF for the pipeline, please take a look on my branch <https://github.com/ogeagla/spark/compare/spark-mllib-weighting> . Thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265p10378.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?
Thanks for the responses. How would something like HadamardProduct or similar be in order to keep it explicit? Would still be a VectorTransformer so the name and trait would hopefully lead to a somewhat self-documenting class. Xiangrui, do you mean Hadamard product or transform? My initial proposal was only a vector-vector product, but I can extend this to matrices. The transform would require a bit more work, which I'm willing to do, but I'm not sure where FFT comes in, can you elaborate? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265p10355.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?
Hmm... Scaler and Scalar are very close together both in terms of pronunciation and spelling - and I wouldn't want to create confusion between the two. Further - this operation (elementwise multiplication by a static vector) is general enough that maybe it should have a more general name? On Tue, Jan 27, 2015 at 7:54 AM, Xiangrui Meng wrote: > I would call it Scaler. You might want to add it to the spark.ml pipieline > api. Please check the spark.ml.HashingTF implementation. Note that this > should handle sparse vectors efficiently. > > Hadamard and FFTs are quite useful. If you are intetested, make sure that > we call an FFT libary that is license-compatible with Apache. > > -Xiangrui > On Jan 24, 2015 8:27 AM, "Octavian Geagla" wrote: > > > Hello, > > > > I found it useful to implement the Hadamard Product > > <https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29http://> > > as > > a VectorTransformer. It can be applied to scale (by a constant) a > certain > > dimension (column) of the data set. > > > > Since I've already implemented it and am using it, I thought I'd see if > > there's interest in this feature going in as Experimental. I'm not sold > on > > the name 'Weighter', either. > > > > Here's the current branch with the work (docs, impl, tests). > > <https://github.com/ogeagla/spark/compare/spark-mllib-weighting> > > > > The implementation was heavily inspired by those of StandardScalerModel > and > > Normalizer. > > > > Thanks > > Octavian > > > > > > > > -- > > View this message in context: > > > http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265.html > > Sent from the Apache Spark Developers List mailing list archive at > > Nabble.com. > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > >
Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?
I would call it Scaler. You might want to add it to the spark.ml pipieline api. Please check the spark.ml.HashingTF implementation. Note that this should handle sparse vectors efficiently. Hadamard and FFTs are quite useful. If you are intetested, make sure that we call an FFT libary that is license-compatible with Apache. -Xiangrui On Jan 24, 2015 8:27 AM, "Octavian Geagla" wrote: > Hello, > > I found it useful to implement the Hadamard Product > <https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29http://> > as > a VectorTransformer. It can be applied to scale (by a constant) a certain > dimension (column) of the data set. > > Since I've already implemented it and am using it, I thought I'd see if > there's interest in this feature going in as Experimental. I'm not sold on > the name 'Weighter', either. > > Here's the current branch with the work (docs, impl, tests). > <https://github.com/ogeagla/spark/compare/spark-mllib-weighting> > > The implementation was heavily inspired by those of StandardScalerModel and > Normalizer. > > Thanks > Octavian > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Any interest in 'weighting' VectorTransformer which does component-wise scaling?
Hello, I found it useful to implement the Hadamard Product <https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29http://> as a VectorTransformer. It can be applied to scale (by a constant) a certain dimension (column) of the data set. Since I've already implemented it and am using it, I thought I'd see if there's interest in this feature going in as Experimental. I'm not sold on the name 'Weighter', either. Here's the current branch with the work (docs, impl, tests). <https://github.com/ogeagla/spark/compare/spark-mllib-weighting> The implementation was heavily inspired by those of StandardScalerModel and Normalizer. Thanks Octavian -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org