Maybe not a major thing but in the DirichletMapper I see that
Writables are not reused but new-ed

Line 44: context.write(new Text(String.valueOf(k)), v);

and in the for loop in the setup method

Line 58: context.write(new Text(Integer.toString(i)), new
VectorWritable(new DenseVector(0)));

See 
http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/

Frank

On Wed, Nov 2, 2011 at 10:13 PM, Grant Ingersoll <[email protected]> wrote:
> Tim Potter and I have tried running Dirchlet in the past on the ASF email set 
> on EC2 and it didn't seem to scale all that well, so I was wondering if 
> people had ideas on improving it's speed.  One question I had is whether we 
> could inject a Combiner into the process?  Ted also mentioned that there 
> might be faster ways to check the models, but I will ask him to elaborate.
>
> Thanks,
> Grant

Reply via email to