2 second canopy clustering over reuters :D
On Fri, Feb 19, 2010 at 3:33 AM, Robin Anil <robin.a...@gmail.com> wrote: > This really doesnt work for, i cant modify any vectors inside distance > measure. So i have wrote a subtract inside manhattan distance itself. Works > great for now > > > On Fri, Feb 19, 2010 at 3:10 AM, Jake Mannix <jake.man...@gmail.com>wrote: > >> currentVector.assign(otherVector, minus) takes the other vector, and >> subtracts >> it from currentVector, which mutates currentVector. If currentVector is >> DenseVector, >> this is already optimized. It could be optimized if currentVector is >> RandomAccessSparse. >> >> -jake >> >> On Thu, Feb 18, 2010 at 1:29 PM, Robin Anil <robin.a...@gmail.com> wrote: >> >> > Just to be clear, this does: >> > currentVector-otherVector ? >> > >> > currentVector.assign(otherVector, Functions.minus); >> > >> > >> > >> > On Fri, Feb 19, 2010 at 2:57 AM, Jake Mannix <jake.man...@gmail.com> >> > wrote: >> > >> > > to do subtractFrom, you can instead just do >> > > >> > > Vector.assign(otherVector, Functions.minus); >> > > >> > > The problem is that while DenseVector has an optimization here: if the >> > > BinaryFunction passed in is additive (it's an instance of PlusMult), >> > > sparse iteration over "otherVector" is executed, applying the binary >> > > function and mutating self. AbstractVector should have this >> optimization >> > > in general, as it would be useful in RandomAccessSparseVector >> (although >> > > not terribly useful in SequentialAccessSparseVector, but still better >> > than >> > > current). >> > > >> > > -jake >> > > >> > > On Thu, Feb 18, 2010 at 1:19 PM, Robin Anil <robin.a...@gmail.com> >> > wrote: >> > > >> > > > I just had to change it at one place(and the tests pass, which is >> > scary). >> > > > Canopy is really fast now :). Still could be pushed >> > > > Now the bottleneck is minus >> > > > >> > > > maybe a subtractFrom on the lines of addTo? or a mutable negate >> > function >> > > > for >> > > > vector, before adding to >> > > > >> > > > Robin >> > > > >> > > > >> > > > >> > > > On Fri, Feb 19, 2010 at 2:43 AM, Jake Mannix <jake.man...@gmail.com >> > >> > > > wrote: >> > > > >> > > > > I use it (addTo) in decomposer, for exactly this performance >> issue. >> > > > > Changing >> > > > > plus into addTo requires care, because since plus() leaves >> arguments >> > > > > immutable, >> > > > > there may be code which *assumes* that this is the case, and doing >> > > > addTo() >> > > > > leaves side effects which might not be expected. This bit me hard >> on >> > > svd >> > > > > migration, because I had other assumptions about mutability in >> there. >> > > > > >> > > > > -jake >> > > > > >> > > > > On Thu, Feb 18, 2010 at 1:09 PM, Robin Anil <robin.a...@gmail.com >> > >> > > > wrote: >> > > > > >> > > > > > ah! Its not being used anywhere :). Should we make that a big >> task >> > > > before >> > > > > > 0.3 ? Sweep through code(mainly clustering) and change all these >> > > > things. >> > > > > > >> > > > > > Robin >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Fri, Feb 19, 2010 at 2:36 AM, Sean Owen <sro...@gmail.com> >> > wrote: >> > > > > > >> > > > > > > Isn't this basically what assign() is for? >> > > > > > > >> > > > > > > On Thu, Feb 18, 2010 at 9:04 PM, Robin Anil < >> > robin.a...@gmail.com> >> > > > > > wrote: >> > > > > > > > Now the big perf bottle neck is immutability >> > > > > > > > >> > > > > > > > Say for plus its doing vector.clone() before doing anything >> > else. >> > > > > > > > There should be both immutable and mutable plus functions >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >