Another problem that has been noted before and not fixed is that sampling from 
the posterior of model distributions is done by copying the posterior model and 
not (is it Gibbs?) sampling of its parameters. As I understand it this is a 
maximum likelihood sampling hack that seems to work pretty well, but not true 
DPC. I wish I had a better understanding of this aspect.

-----Original Message-----
From: Frank Scholten [mailto:[email protected]] 
Sent: Wednesday, November 02, 2011 3:11 PM
To: [email protected]
Subject: Re: Dirchlet

On Wed, Nov 2, 2011 at 11:05 PM, Ted Dunning <[email protected]> wrote:
> I have done some testing and have been unable to demonstrate a big
> difference in allocating versus re-using.  Re-using is, however, *really*
> error prone.
>
> I think that most of the supposed cost of new allocations is actually the
> cost of copying of large data rather than the cost of allocating the
> container.  Here, the largest copy is the new DenseVector.
>
> All of these pale behind bad arithmetic and no combiner.

Yeah, makes sense.

>
> On Wed, Nov 2, 2011 at 2:37 PM, Frank Scholten <[email protected]>wrote:
>
>> Maybe not a major thing but in the DirichletMapper I see that
>> Writables are not reused but new-ed
>>
>> Line 44: context.write(new Text(String.valueOf(k)), v);
>>
>> and in the for loop in the setup method
>>
>> Line 58: context.write(new Text(Integer.toString(i)), new
>> VectorWritable(new DenseVector(0)));
>>
>> See
>> http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/
>>
>> Frank
>>
>> On Wed, Nov 2, 2011 at 10:13 PM, Grant Ingersoll <[email protected]>
>> wrote:
>> > Tim Potter and I have tried running Dirchlet in the past on the ASF
>> email set on EC2 and it didn't seem to scale all that well, so I was
>> wondering if people had ideas on improving it's speed.  One question I had
>> is whether we could inject a Combiner into the process?  Ted also mentioned
>> that there might be faster ways to check the models, but I will ask him to
>> elaborate.
>> >
>> > Thanks,
>> > Grant
>>
>

Reply via email to