On Thu, Nov 3, 2011 at 2:18 PM, Jeff Eastman <[email protected]> wrote:

> AbstractCluster already has the running sum of squares implemented and the
> kmeans and fuzzyk combiners count on being able to combine its partial
> parameters (see ClusterObservations which are passed to combiner and
> reducer). I have an implementation of Wellford in OnlineGaussianAccumulator
> which I would love to substitute, but I don't know the math to combine
> them. If, as you say, it is "like addition", could you please be more
> specific (i.e. suggest a combine(other) method for that OGA?)
>

That is an interesting idea to actually put that method on the OGA.  I have
been thinking only in terms of models, but having it there as well wouldn't
be bad at all.

OGA does the computation of mean and variance on a per coordinate basis.
 This is the axis aligned case that I mentioned.


>
> With respect to a Dirichlet combiner, the same mechanism of combining
> observations used in kmeans and fuzzyk should work, but perhaps those
> combiners should be passing clusters and combining cluster observations
> too, rather than just passing the running sums in ClusterObservations?
>

I think that a combiner based clustering should only be passing clusters.
 A non-combiner clustering should pass points.  A resolutoin for that
tension is not  obvious to me.


> This is something I would really like to clean up for 1.0
>

Indeed.

Reply via email to