On Wed, Nov 2, 2011 at 11:05 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > I have done some testing and have been unable to demonstrate a big > difference in allocating versus re-using. Re-using is, however, *really* > error prone. > > I think that most of the supposed cost of new allocations is actually the > cost of copying of large data rather than the cost of allocating the > container. Here, the largest copy is the new DenseVector. > > All of these pale behind bad arithmetic and no combiner.
Yeah, makes sense. > > On Wed, Nov 2, 2011 at 2:37 PM, Frank Scholten <fr...@frankscholten.nl>wrote: > >> Maybe not a major thing but in the DirichletMapper I see that >> Writables are not reused but new-ed >> >> Line 44: context.write(new Text(String.valueOf(k)), v); >> >> and in the for loop in the setup method >> >> Line 58: context.write(new Text(Integer.toString(i)), new >> VectorWritable(new DenseVector(0))); >> >> See >> http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ >> >> Frank >> >> On Wed, Nov 2, 2011 at 10:13 PM, Grant Ingersoll <gsing...@apache.org> >> wrote: >> > Tim Potter and I have tried running Dirchlet in the past on the ASF >> email set on EC2 and it didn't seem to scale all that well, so I was >> wondering if people had ideas on improving it's speed. One question I had >> is whether we could inject a Combiner into the process? Ted also mentioned >> that there might be faster ways to check the models, but I will ask him to >> elaborate. >> > >> > Thanks, >> > Grant >> >