Thanks very much Steve! The clustering part of the code is really a blackbox and there isn't much to do as far as restructuring. I ended up breaking the big input file into smaller ones and I am letting it running on the cluster. I will know in the morning if it successfully or not. But, I will consider using Mahout for clustering since it is built-in with the mapreduce. I will let you know how that goes if you are interested.
Thanks very much once again for your kind responses! -Ahmed On Mon, Jan 23, 2012 at 9:09 PM, Steve Lewis <lordjoe2...@gmail.com> wrote: > It sounds like the HierarchicalClusterer whatever that is is doing what > a collection of reducers should be doing - try to restructure the job so > that the clustering is done more in the sort step allowing the reducer to > simply collect clusters - the cluster method needs to be > rearchitected to lean more heavily on map-reduce >