Yeah, Canopy issue is sorted out. Was thinking of adding a flag to add point to a single canopy instead of adding it to all canopies. This would help a lot on large datasets. There is no point of adding to all canopies, you will get approximate clustering anyways
I have cleaned up most of SoftCluster. Still the error exists. It seems to be looping forever now. I will post a patch on the issue take please take a look Robin On Wed, Feb 17, 2010 at 3:35 PM, Jeff Eastman <j...@windwardsolutions.com>wrote: > Robin Anil wrote: > >> Hadoop reuses the *same* instance whenever it uses readFields and I've >>> been >>> bitten more than once by assuming otherwise. >>> >>> >> >> Yep!. Thats our bug. Always assume mutability in Hadoop :) . I will see >> the >> where the writable is causing the error. >> Best is if we could have some test data and make a check to see if the >> algorithm is working. >> >> >> > Good hunting. I notice that some of the code in the fuzzy MR unit test has > been commented out but have not looked into it further. > > I assume also you have sorted out the canopy issue you were having? > > Jeff >