Maybe not a major thing but in the DirichletMapper I see that Writables are not reused but new-ed
Line 44: context.write(new Text(String.valueOf(k)), v); and in the for loop in the setup method Line 58: context.write(new Text(Integer.toString(i)), new VectorWritable(new DenseVector(0))); See http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ Frank On Wed, Nov 2, 2011 at 10:13 PM, Grant Ingersoll <[email protected]> wrote: > Tim Potter and I have tried running Dirchlet in the past on the ASF email set > on EC2 and it didn't seem to scale all that well, so I was wondering if > people had ideas on improving it's speed. One question I had is whether we > could inject a Combiner into the process? Ted also mentioned that there > might be faster ways to check the models, but I will ask him to elaborate. > > Thanks, > Grant
