Re: Dirchlet

Frank Scholten Wed, 02 Nov 2011 15:11:01 -0700

On Wed, Nov 2, 2011 at 11:05 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> I have done some testing and have been unable to demonstrate a big
> difference in allocating versus re-using.  Re-using is, however, *really*
> error prone.
>
> I think that most of the supposed cost of new allocations is actually the
> cost of copying of large data rather than the cost of allocating the
> container.  Here, the largest copy is the new DenseVector.
>
> All of these pale behind bad arithmetic and no combiner.


Yeah, makes sense.

>
> On Wed, Nov 2, 2011 at 2:37 PM, Frank Scholten <fr...@frankscholten.nl>wrote:
>
>> Maybe not a major thing but in the DirichletMapper I see that
>> Writables are not reused but new-ed
>>
>> Line 44: context.write(new Text(String.valueOf(k)), v);
>>
>> and in the for loop in the setup method
>>
>> Line 58: context.write(new Text(Integer.toString(i)), new
>> VectorWritable(new DenseVector(0)));
>>
>> See
>> http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/
>>
>> Frank
>>
>> On Wed, Nov 2, 2011 at 10:13 PM, Grant Ingersoll <gsing...@apache.org>
>> wrote:
>> > Tim Potter and I have tried running Dirchlet in the past on the ASF
>> email set on EC2 and it didn't seem to scale all that well, so I was
>> wondering if people had ideas on improving it's speed.  One question I had
>> is whether we could inject a Combiner into the process?  Ted also mentioned
>> that there might be faster ways to check the models, but I will ask him to
>> elaborate.
>> >
>> > Thanks,
>> > Grant
>>
>

Re: Dirchlet

Reply via email to