RE: Dirchlet

Jeff Eastman Thu, 03 Nov 2011 14:19:23 -0700

AbstractCluster already has the running sum of squares implemented and the 
kmeans and fuzzyk combiners count on being able to combine its partial 
parameters (see ClusterObservations which are passed to combiner and reducer). 
I have an implementation of Wellford in OnlineGaussianAccumulator which I would 
love to substitute, but I don't know the math to combine them. If, as you say, 
it is "like addition", could you please be more specific (i.e. suggest a 
combine(other) method for that OGA?)

With respect to a Dirichlet combiner, the same mechanism of combining 
observations used in kmeans and fuzzyk should work, but perhaps those combiners 
should be passing clusters and combining cluster observations too, rather than 
just passing the running sums in ClusterObservations?

This is something I would really like to clean up for 1.0

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, November 03, 2011 1:40 PM
To: [email protected]
Subject: Re: Dirchlet

Sure.

If we take a normal distribution in one dimension (for simplicity), the
sufficient statistics are the sample mean, the sample variance and the
number of observations.  These can be collected by keeping the sum, the sum
of squares and the count and it is easy to see how to combine model
estimates by just adding these three numbers together.  Moreover, a prior
can be expressed as the sufficient statistics of some number of virtual
observations so producing a single sample model from a single observation x
consists of just adding x to the prior sum, x^2 to the prior sum of squares
and 1 to the prior count.  Any real implementation should be a little bit
fancier since computing variance using the sum and sum of squares is
numerically really bad.  Something like Welford's method would actually be
what is used.

For a multivariate normal distribution, we have some important special
cases.  These include the symmetric normal (covariance is the identity
matrix times a constant), the axis aligned normal (covariance is diagonal)
and the general case.  The symmetric normal can just be a (slightly
complicateder) extension of the one dimensional case.  The general case
requires that we keep the vector sum and the matrix sum of the outer
products of the results analogously to the way that sum and sum of squares
were kept for the one dimensional case and, again, some numerical
sophistication is required to make this be a stable on-line algorithm.
 Typically a symmetric normal is used as the prior for the multivariate
case. (see wikipedia for one simple estimation
method<http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters>
that
can be suitably extended to the on-line case).  I think that there is a
direct analogy of Welford's method in multiple dimensions.

Does this help?

On Thu, Nov 3, 2011 at 1:24 PM, Grant Ingersoll <[email protected]> wrote:

>
> On Nov 2, 2011, at 5:31 PM, Ted Dunning wrote:
> >
> >   For some kinds of models, notably all of the ones from the
> > exponential class, there exist sufficient statistics and the combination
> of
> > models really is a lot like addition.  Most of the uses of DP clustering
> > involve exponential models like the normal distribution so the world is a
> > happy place.
>
> Can you elaborate on this a bit more in terms of concrete steps we could
> take to implement this?

RE: Dirchlet

Reply via email to