On Mon, Jul 27, 2009 at 10:11 PM, Grant Ingersoll<[email protected]> wrote: > > Not following. The distance calc stuff is irrespective of the type of > Vector. I was referring to the centroid length square (I think you called > it the triangle inequality) stuff that Shashikant added on MAHOUT-121. We > use it for testing convergence, but not for other distance calculations. I > haven't looked to see if it is applicable yet, but it seems like it should > be. >
Grant, Yes, that part of the patch is missing. In my original patch, I had modified the emitPointToNearestCluster() in kmeans/Cluster.java to calculate distance between document and centroids of various clusters. (There is no triangle inequality code, though.) In the later patches I don't see that code. I had reviewed the final patch, but I missed out on this one. I think, I only ran Canopy and not K-means. Incidentally, I am hopelessly out of date with trunk as recently I have not worked on this. BTW, I haven't really followed this thread in depth. So, I might be speaking out of context here. Apologies. --shashi
