addTo() is mutable plus. On Thu, Feb 18, 2010 at 1:04 PM, Robin Anil <robin.a...@gmail.com> wrote:
> Now the big perf bottle neck is immutability > > Say for plus its doing vector.clone() before doing anything else. > There should be both immutable and mutable plus functions > > Robin > > > > On Fri, Feb 19, 2010 at 2:07 AM, Jake Mannix <jake.man...@gmail.com> > wrote: > > > I dunno, we can file it for whenever, 0.4 and if it turns out it's a > really > > easy > > change we can always commit it for 0.3. > > > > -jake > > > > On Thu, Feb 18, 2010 at 12:29 PM, Robin Anil <robin.a...@gmail.com> > wrote: > > > > > File it for 0.3 ? > > > > > > > > > Robin > > > > > > On Fri, Feb 19, 2010 at 1:56 AM, Jake Mannix <jake.man...@gmail.com> > > > wrote: > > > > > > > On Thu, Feb 18, 2010 at 11:55 AM, Robin Anil <robin.a...@gmail.com> > > > wrote: > > > > > > > > > I was trying out SeqAccessSparseVector on Canopy Clustering using > > > > Manhattan > > > > > distance. I found performance to be really bad. So I profiled it > with > > > > > Yourkit(Thanks a lot for providing us free license) > > > > > > > > > > Since i was trying out manhattan distance, there were a lot of A-B > > > which > > > > > created a lot of clone operation 5% of the total time > > > > > there were also so many A+B for adding a point to the canopy to > > > average. > > > > > this was also creating a lot of clone operations. 90% of the total > > > time > > > > > > > > > > > > > SequentialAccessSparseVector should only be used in a read-only > > fashion. > > > > If > > > > you are creating an average centroid which is sparse, but it is > > mutating, > > > > then it should be RandomAccessSparseVector. The points which are > being > > > > used > > > > to create it can be SequentialAccessSparseVector (if they themselves > > > never > > > > change), but then the method called should be > > > > SequentialAccessSparseVector.addTo(RandomAccessSparseVector) - this > > > > exploits > > > > the fast sequential iteration of SeqAcc, and the fast random-access > > > > mutatability of RandAcc. > > > > > > > > > > > > > > > > > > So we definitely needs to improve that.. > > > > > > > > > > For a small hack. I made the cluster centers RandomAccess Vector. > > > Things > > > > > are fast again. I dont know whether to commit or not. But something > > to > > > > look > > > > > into in 0.4? > > > > > > > > > > > > > Yeah, cluster *centers* should indeed be RandomAccess. JIRA / patch > so > > > we > > > > can see exactly what the change is? > > > > > > > > -jake > > > > > > > > > >