Re: Understanding Canopy/Map Reduce

Shashikant Kore Thu, 24 Sep 2009 00:26:51 -0700

Thanks Grant, Jeff for clarification. I understand it now.

--shashi


On Tue, Sep 22, 2009 at 10:40 PM, Grant Ingersoll <[email protected]> wrote:
>
> On Sep 22, 2009, at 9:59 AM, Shashikant Kore wrote:
>
>> Hi,
>>
>> I am unable to understand how the Canopy clustering works.
>>
>> In Map stage, Canopy.addPointToCanopies() is called for every point
>> with list of canopies. This method adds to the existing canopy or
>> creates new one or both depending on the distance of the vector from
>> existing canopy centroids.  Map stage outputs all the canopy centroids
>> (with key "centroid").
>>
>> In reduce phase,  these centroids will again undergo the same process
>> (so, possible merges) and finally centroids will be output'ed. But, I
>> see that in CanopyReducer the input values are the input vectors and
>> not the centroids received from the Map stage.
>
> If I recall correctly, the centroids get loaded up in the init stage of the
> Mapper and the Reducer, but I don't have the code open at the moment.  Thus,
> the input vectors can then get associated with the centroids.
>
>>
>> I think, I missing something here. Can you please let me know what it is?
>>
>> Note: I am using CanopyDriver utility (and not CanopyClusteringJob).
>>
>> Thanks,
>>
>> --shashi
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: Understanding Canopy/Map Reduce

Reply via email to