Re: Understanding Canopy/Map Reduce

Grant Ingersoll Tue, 22 Sep 2009 10:11:38 -0700


On Sep 22, 2009, at 9:59 AM, Shashikant Kore wrote:

Hi,

I am unable to understand how the Canopy clustering works.

In Map stage, Canopy.addPointToCanopies() is called for every point
with list of canopies. This method adds to the existing canopy or
creates new one or both depending on the distance of the vector from
existing canopy centroids.  Map stage outputs all the canopy centroids
(with key "centroid").

In reduce phase,  these centroids will again undergo the same process
(so, possible merges) and finally centroids will be output'ed. But, I
see that in CanopyReducer the input values are the input vectors and
not the centroids received from the Map stage.

If I recall correctly, the centroids get loaded up in the init stageof the Mapper and the Reducer, but I don't have the code open at themoment. Thus, the input vectors can then get associated with thecentroids.

I think, I missing something here. Can you please let me know whatit is?
Note: I am using CanopyDriver utility (and not CanopyClusteringJob).

Thanks,

--shashi


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Understanding Canopy/Map Reduce

Reply via email to