Jeff Eastman wrote:
Jeff Eastman wrote:
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes the probability that a point will go to any new cluster quite low. You can
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function of alpha in the algorithm?
Now I can answer my own question. Alpha_0 determines the probability a point will go into an empty cluster (ok, almost Ted's exact words). During the first iteration, the total counts of all prior clusters are zero. Thus the Beta calculation that drives the Dirichlet distribution that determines the mixture probabilities degenerates to beta = rBeta(1, alpha_0). Clusters that end up with points for the next iteration will overwhelm the small constants (alpha_0, 1) and subsequent new mixture probabilities will derive from beta ~= rBeta(count, total) which is the current implementation. All empty clusters will subsequently be driven by beta ~= rBeta(1, total) as alpha_0 is insignificant and count is 0.

The current implementation ends up using beta = rBeta(alpha_0/k, alpha_0) as initial values during all iterations because the counts are all initialized to alpha_0/k. Close but no cigar.

Jeff

(nothing new below)
Looking at the current implementation, it is only used to initialize the totalCount values (to alpha/k) when sampling from the prior. AFAICT it is not used anywhere else. Its current role is pretty minimal and I wonder if something fell through the cracks during all of the refactoring from the R prototype.
Well, I looked over the R code and alpha_0 does appear to be used in two places, not one:

- in state initialization "beta = rbeta(K, 1, alpha_0)" [K is the number of models] - during state update "beta[k] = rbeta(1, 1 + counts[k], alpha_0 + N-counts[k])" [N is the cardinality of the sample vector and counts corresponds to totalCounts in the implementation]

The value of beta[k] is then used in the Dirichlet distribution calculation which results in the mixture probabilities pi[i], for the iteration:

   other = 1                                     # product accumulator
   for (k in 1:K) {
pi[k] = beta[k] * other; # beta_k * prod_{n<k} beta_n
     other = other * (1-beta[k])
     }

Alpha_0 does not appear to ever be added to the total counts nor is it divided by K as in the implementation so it looks like something did get lost in the refactoring. In the implementation, UncommonDistributions.rDirichlet(Vector alpha) is passed the totalCounts to compute the mixture probabilities and the rBeta arguments do not use alpha_0 as in R. There are other differences; however, and rDirichlet looks like:

 public static Vector rDirichlet(Vector alpha) {
   Vector r = alpha.like();
   double total = alpha.zSum();
   double remainder = 1;
   for (int i = 0; i < r.size(); i++) {
     double a = alpha.get(i);
     total -= a;
     double beta = rBeta(a, Math.max(0, total));
     double p = beta * remainder;
     r.set(i, p);
     remainder -= p;
   }
   return r;
 }



Hi Ted,

I made the following changes, which still seem to work. I added alpha_0 as an argument to rDirichlet and included it in the beta calculation. I also removed the alpha_0/k totalCount initialization. This now corresponds, I think, to the R code above and degenerates to the same initial beta arguments during initialization when totalCounts are 0. Could you please look this over and see if you agree?

Thanks,
Jeff

 /**
* Sample from a Dirichlet distribution, returning a vector of probabilities using a
  * stick-breaking algorithm
  *
  * @param totalCounts an unnormalized count Vector
  * @param alpha_0 a double
  * @return a Vector of probabilities
  */
 public static Vector rDirichlet(Vector totalCounts, double alpha_0) {
   Vector result = totalCounts.like();
   double total = totalCounts.zSum();
   double other = 1.0;
   for (int i = 0; i < result.size(); i++) {
     double count = totalCounts.get(i);
     total -= count;
     double beta = rBeta(1 + count, Math.max(0, alpha_0 + total));
     double pi = beta * other;
     result.set(i, pi);
     other *= 1 - beta;
   }
   return result;
 }



Reply via email to