If it's as obvious a win as it sounds, I'd say 0.3. We aren't in lock down yet are we?
-Grant On Feb 18, 2010, at 3:37 PM, Jake Mannix wrote: > I dunno, we can file it for whenever, 0.4 and if it turns out it's a really > easy > change we can always commit it for 0.3. > > -jake > > On Thu, Feb 18, 2010 at 12:29 PM, Robin Anil <robin.a...@gmail.com> wrote: > >> File it for 0.3 ? >> >> >> Robin >> >> On Fri, Feb 19, 2010 at 1:56 AM, Jake Mannix <jake.man...@gmail.com> >> wrote: >> >>> On Thu, Feb 18, 2010 at 11:55 AM, Robin Anil <robin.a...@gmail.com> >> wrote: >>> >>>> I was trying out SeqAccessSparseVector on Canopy Clustering using >>> Manhattan >>>> distance. I found performance to be really bad. So I profiled it with >>>> Yourkit(Thanks a lot for providing us free license) >>>> >>>> Since i was trying out manhattan distance, there were a lot of A-B >> which >>>> created a lot of clone operation 5% of the total time >>>> there were also so many A+B for adding a point to the canopy to >> average. >>>> this was also creating a lot of clone operations. 90% of the total >> time >>>> >>> >>> SequentialAccessSparseVector should only be used in a read-only fashion. >>> If >>> you are creating an average centroid which is sparse, but it is mutating, >>> then it should be RandomAccessSparseVector. The points which are being >>> used >>> to create it can be SequentialAccessSparseVector (if they themselves >> never >>> change), but then the method called should be >>> SequentialAccessSparseVector.addTo(RandomAccessSparseVector) - this >>> exploits >>> the fast sequential iteration of SeqAcc, and the fast random-access >>> mutatability of RandAcc. >>> >>> >>>> >>>> So we definitely needs to improve that.. >>>> >>>> For a small hack. I made the cluster centers RandomAccess Vector. >> Things >>>> are fast again. I dont know whether to commit or not. But something to >>> look >>>> into in 0.4? >>>> >>> >>> Yeah, cluster *centers* should indeed be RandomAccess. JIRA / patch so >> we >>> can see exactly what the change is? >>> >>> -jake >>> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search