2 second canopy clustering over reuters :D

On Fri, Feb 19, 2010 at 3:33 AM, Robin Anil <robin.a...@gmail.com> wrote:

> This really doesnt work for, i cant modify any vectors inside distance
> measure. So i have wrote a subtract inside manhattan distance itself. Works
> great for now
>
>
> On Fri, Feb 19, 2010 at 3:10 AM, Jake Mannix <jake.man...@gmail.com>wrote:
>
>> currentVector.assign(otherVector, minus) takes the other vector, and
>> subtracts
>> it from currentVector, which mutates currentVector.  If currentVector is
>> DenseVector,
>> this is already optimized.  It could be optimized if currentVector is
>> RandomAccessSparse.
>>
>>  -jake
>>
>> On Thu, Feb 18, 2010 at 1:29 PM, Robin Anil <robin.a...@gmail.com> wrote:
>>
>> > Just to be clear, this does:
>> > currentVector-otherVector ?
>> >
>> > currentVector.assign(otherVector, Functions.minus);
>> >
>> >
>> >
>> > On Fri, Feb 19, 2010 at 2:57 AM, Jake Mannix <jake.man...@gmail.com>
>> > wrote:
>> >
>> > > to do subtractFrom, you can instead just do
>> > >
>> > >  Vector.assign(otherVector, Functions.minus);
>> > >
>> > > The problem is that while DenseVector has an optimization here: if the
>> > > BinaryFunction passed in is additive (it's an instance of PlusMult),
>> > > sparse iteration over "otherVector" is executed, applying the binary
>> > > function and mutating self.  AbstractVector should have this
>> optimization
>> > > in general, as it would be useful in RandomAccessSparseVector
>> (although
>> > > not terribly useful in SequentialAccessSparseVector, but still better
>> > than
>> > > current).
>> > >
>> > >  -jake
>> > >
>> > > On Thu, Feb 18, 2010 at 1:19 PM, Robin Anil <robin.a...@gmail.com>
>> > wrote:
>> > >
>> > > > I just had to change it at one place(and the tests pass, which is
>> > scary).
>> > > > Canopy is really fast now :). Still could be pushed
>> > > > Now the bottleneck is minus
>> > > >
>> > > > maybe a subtractFrom on the lines of addTo? or a mutable negate
>> > function
>> > > > for
>> > > > vector, before adding to
>> > > >
>> > > > Robin
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Feb 19, 2010 at 2:43 AM, Jake Mannix <jake.man...@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > I use it (addTo) in decomposer, for exactly this performance
>> issue.
>> > > > > Changing
>> > > > > plus into addTo requires care, because since plus() leaves
>> arguments
>> > > > > immutable,
>> > > > > there may be code which *assumes* that this is the case, and doing
>> > > > addTo()
>> > > > > leaves side effects which might not be expected.  This bit me hard
>> on
>> > > svd
>> > > > > migration, because I had other assumptions about mutability in
>> there.
>> > > > >
>> > > > >  -jake
>> > > > >
>> > > > > On Thu, Feb 18, 2010 at 1:09 PM, Robin Anil <robin.a...@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > ah! Its not being used anywhere :). Should we make that a big
>> task
>> > > > before
>> > > > > > 0.3 ? Sweep through code(mainly clustering) and change all these
>> > > > things.
>> > > > > >
>> > > > > > Robin
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Feb 19, 2010 at 2:36 AM, Sean Owen <sro...@gmail.com>
>> > wrote:
>> > > > > >
>> > > > > > > Isn't this basically what assign() is for?
>> > > > > > >
>> > > > > > > On Thu, Feb 18, 2010 at 9:04 PM, Robin Anil <
>> > robin.a...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > Now the big perf bottle neck is immutability
>> > > > > > > >
>> > > > > > > > Say for plus its doing vector.clone() before doing anything
>> > else.
>> > > > > > > > There should be both immutable and mutable plus functions
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to