Re: [Fwd: Re: Dirichlet Processing Clustering - Synthetic Control Data]

Jeff Eastman Fri, 05 Feb 2010 22:26:52 -0800

Jeff Eastman wrote:

Jeff Eastman wrote:
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makestheprobability that a point will go to any new cluster quite low. Youcan
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function of alpha in the algorithm?

Now I can answer my own question. Alpha_0 determines the probability apoint will go into an empty cluster (ok, almost Ted's exact words).During the first iteration, the total counts of all prior clusters arezero. Thus the Beta calculation that drives the Dirichlet distributionthat determines the mixture probabilities degenerates to beta = rBeta(1,alpha_0). Clusters that end up with points for the next iteration willoverwhelm the small constants (alpha_0, 1) and subsequent new mixtureprobabilities will derive from beta ~= rBeta(count, total) which is thecurrent implementation. All empty clusters will subsequently be drivenby beta ~= rBeta(1, total) as alpha_0 is insignificant and count is 0.

The current implementation ends up using beta = rBeta(alpha_0/k,alpha_0) as initial values during all iterations because the counts areall initialized to alpha_0/k. Close but no cigar.


Jeff

(nothing new below)

Looking at the current implementation, it is only used to initializethe totalCount values (to alpha/k) when sampling from the prior.AFAICT it is not used anywhere else. Its current role is prettyminimal and I wonder if something fell through the cracks during allof the refactoring from the R prototype.
Well, I looked over the R code and alpha_0 does appear to be used intwo places, not one:
- in state initialization "beta = rbeta(K, 1, alpha_0)" [K is thenumber of models]- during state update "beta[k] = rbeta(1, 1 + counts[k], alpha_0 +N-counts[k])" [N is the cardinality of the sample vector and countscorresponds to totalCounts in the implementation]
The value of beta[k] is then used in the Dirichlet distributioncalculation which results in the mixture probabilities pi[i], for theiteration:
   other = 1                                     # product accumulator
   for (k in 1:K) {
pi[k] = beta[k] * other; # beta_k *prod_{n<k} beta_n
     other = other * (1-beta[k])
     }
Alpha_0 does not appear to ever be added to the total counts nor isit divided by K as in the implementation so it looks like somethingdid get lost in the refactoring. In the implementation,UncommonDistributions.rDirichlet(Vector alpha) is passed thetotalCounts to compute the mixture probabilities and the rBetaarguments do not use alpha_0 as in R. There are other differences;however, and rDirichlet looks like:
 public static Vector rDirichlet(Vector alpha) {
   Vector r = alpha.like();
   double total = alpha.zSum();
   double remainder = 1;
   for (int i = 0; i < r.size(); i++) {
     double a = alpha.get(i);
     total -= a;
     double beta = rBeta(a, Math.max(0, total));
     double p = beta * remainder;
     r.set(i, p);
     remainder -= p;
   }
   return r;
 }
Hi Ted,
I made the following changes, which still seem to work. I addedalpha_0 as an argument to rDirichlet and included it in the betacalculation. I also removed the alpha_0/k totalCount initialization.This now corresponds, I think, to the R code above and degenerates tothe same initial beta arguments during initialization when totalCountsare 0. Could you please look this over and see if you agree?
Thanks,
Jeff

 /**
* Sample from a Dirichlet distribution, returning a vector ofprobabilities using a
  * stick-breaking algorithm
  *
  * @param totalCounts an unnormalized count Vector
  * @param alpha_0 a double
  * @return a Vector of probabilities
  */
 public static Vector rDirichlet(Vector totalCounts, double alpha_0) {
   Vector result = totalCounts.like();
   double total = totalCounts.zSum();
   double other = 1.0;
   for (int i = 0; i < result.size(); i++) {
     double count = totalCounts.get(i);
     total -= count;
     double beta = rBeta(1 + count, Math.max(0, alpha_0 + total));
     double pi = beta * other;
     result.set(i, pi);
     other *= 1 - beta;
   }
   return result;
 }

Re: [Fwd: Re: Dirichlet Processing Clustering - Synthetic Control Data]

Reply via email to