Thanks Grant, Jeff for clarification. I understand it now. --shashi
On Tue, Sep 22, 2009 at 10:40 PM, Grant Ingersoll <[email protected]> wrote: > > On Sep 22, 2009, at 9:59 AM, Shashikant Kore wrote: > >> Hi, >> >> I am unable to understand how the Canopy clustering works. >> >> In Map stage, Canopy.addPointToCanopies() is called for every point >> with list of canopies. This method adds to the existing canopy or >> creates new one or both depending on the distance of the vector from >> existing canopy centroids. Map stage outputs all the canopy centroids >> (with key "centroid"). >> >> In reduce phase, these centroids will again undergo the same process >> (so, possible merges) and finally centroids will be output'ed. But, I >> see that in CanopyReducer the input values are the input vectors and >> not the centroids received from the Map stage. > > If I recall correctly, the centroids get loaded up in the init stage of the > Mapper and the Reducer, but I don't have the code open at the moment. Thus, > the input vectors can then get associated with the centroids. > >> >> I think, I missing something here. Can you please let me know what it is? >> >> Note: I am using CanopyDriver utility (and not CanopyClusteringJob). >> >> Thanks, >> >> --shashi > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
