Re: distributing a time consuming single reduce task

Ahmed Abdeen Hamed Mon, 23 Jan 2012 18:50:15 -0800

Thanks very much Steve!

The clustering part of the code is really a blackbox and there isn't much
to do as far as restructuring. I ended up breaking the big input file into
smaller ones and I am letting it running on the cluster. I will know in the
morning if it successfully or not. But, I will consider using Mahout for
clustering since it is built-in with the mapreduce. I will let you know how
that goes if you are interested.


Thanks very much once again for your kind responses!
-Ahmed


On Mon, Jan 23, 2012 at 9:09 PM, Steve Lewis <lordjoe2...@gmail.com> wrote:

>  It sounds like the  HierarchicalClusterer  whatever that is is doing what
> a collection of reducers should be doing - try to restructure the job so
> that the clustering is done more in the sort step allowing the reducer to
> simply collect clusters - the cluster method needs to be
> rearchitected to lean more heavily on map-reduce
>

Re: distributing a time consuming single reduce task

Reply via email to