addTo() is mutable plus.

On Thu, Feb 18, 2010 at 1:04 PM, Robin Anil <robin.a...@gmail.com> wrote:

> Now the big perf bottle neck is immutability
>
> Say for plus its doing vector.clone() before doing anything else.
> There should be both immutable and mutable plus functions
>
> Robin
>
>
>
> On Fri, Feb 19, 2010 at 2:07 AM, Jake Mannix <jake.man...@gmail.com>
> wrote:
>
> > I dunno, we can file it for whenever, 0.4 and if it turns out it's a
> really
> > easy
> > change we can always commit it for 0.3.
> >
> >  -jake
> >
> > On Thu, Feb 18, 2010 at 12:29 PM, Robin Anil <robin.a...@gmail.com>
> wrote:
> >
> > > File it for 0.3 ?
> > >
> > >
> > > Robin
> > >
> > > On Fri, Feb 19, 2010 at 1:56 AM, Jake Mannix <jake.man...@gmail.com>
> > > wrote:
> > >
> > > > On Thu, Feb 18, 2010 at 11:55 AM, Robin Anil <robin.a...@gmail.com>
> > > wrote:
> > > >
> > > > > I was trying out SeqAccessSparseVector on Canopy Clustering using
> > > > Manhattan
> > > > > distance. I found performance to be really bad. So I profiled it
> with
> > > > > Yourkit(Thanks a lot for providing us free license)
> > > > >
> > > > > Since i was trying out manhattan distance, there were a lot of A-B
> > > which
> > > > > created a lot of clone operation 5% of the total time
> > > > > there were also so many A+B for adding a point to the canopy to
> > > average.
> > > > > this was also creating a lot of clone operations.  90% of the total
> > > time
> > > > >
> > > >
> > > > SequentialAccessSparseVector should only be used in a read-only
> > fashion.
> > > >  If
> > > > you are creating an average centroid which is sparse, but it is
> > mutating,
> > > > then it should be RandomAccessSparseVector.  The points which are
> being
> > > > used
> > > > to create it can be SequentialAccessSparseVector (if they themselves
> > > never
> > > > change), but then the method called should be
> > > > SequentialAccessSparseVector.addTo(RandomAccessSparseVector) - this
> > > > exploits
> > > > the fast sequential iteration of SeqAcc, and the fast random-access
> > > > mutatability of RandAcc.
> > > >
> > > >
> > > > >
> > > > > So we definitely needs to improve that..
> > > > >
> > > > > For a small hack. I made the cluster centers RandomAccess Vector.
> > > Things
> > > > > are fast again. I dont know whether to commit or not. But something
> > to
> > > > look
> > > > > into in 0.4?
> > > > >
> > > >
> > > > Yeah, cluster *centers* should indeed be RandomAccess.  JIRA / patch
> so
> > > we
> > > > can see exactly what the change is?
> > > >
> > > >  -jake
> > > >
> > >
> >
>

Reply via email to