Jeff Eastman wrote:
Jeff Eastman wrote:
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes
the
probability that a point will go to any new cluster quite low. You
can
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function of alpha in the algorithm?
Now I can answer my own question. Alpha_0 determines the probability a
point will go into an empty cluster (ok, almost Ted's exact words).
During the first iteration, the total counts of all prior clusters are
zero. Thus the Beta calculation that drives the Dirichlet distribution
that determines the mixture probabilities degenerates to beta = rBeta(1,
alpha_0). Clusters that end up with points for the next iteration will
overwhelm the small constants (alpha_0, 1) and subsequent new mixture
probabilities will derive from beta ~= rBeta(count, total) which is the
current implementation. All empty clusters will subsequently be driven
by beta ~= rBeta(1, total) as alpha_0 is insignificant and count is 0.
The current implementation ends up using beta = rBeta(alpha_0/k,
alpha_0) as initial values during all iterations because the counts are
all initialized to alpha_0/k. Close but no cigar.
Jeff
(nothing new below)
Looking at the current implementation, it is only used to initialize
the totalCount values (to alpha/k) when sampling from the prior.
AFAICT it is not used anywhere else. Its current role is pretty
minimal and I wonder if something fell through the cracks during all
of the refactoring from the R prototype.
Well, I looked over the R code and alpha_0 does appear to be used in
two places, not one:
- in state initialization "beta = rbeta(K, 1, alpha_0)" [K is the
number of models]
- during state update "beta[k] = rbeta(1, 1 + counts[k], alpha_0 +
N-counts[k])" [N is the cardinality of the sample vector and counts
corresponds to totalCounts in the implementation]
The value of beta[k] is then used in the Dirichlet distribution
calculation which results in the mixture probabilities pi[i], for the
iteration:
other = 1 # product accumulator
for (k in 1:K) {
pi[k] = beta[k] * other; # beta_k *
prod_{n<k} beta_n
other = other * (1-beta[k])
}
Alpha_0 does not appear to ever be added to the total counts nor is
it divided by K as in the implementation so it looks like something
did get lost in the refactoring. In the implementation,
UncommonDistributions.rDirichlet(Vector alpha) is passed the
totalCounts to compute the mixture probabilities and the rBeta
arguments do not use alpha_0 as in R. There are other differences;
however, and rDirichlet looks like:
public static Vector rDirichlet(Vector alpha) {
Vector r = alpha.like();
double total = alpha.zSum();
double remainder = 1;
for (int i = 0; i < r.size(); i++) {
double a = alpha.get(i);
total -= a;
double beta = rBeta(a, Math.max(0, total));
double p = beta * remainder;
r.set(i, p);
remainder -= p;
}
return r;
}
Hi Ted,
I made the following changes, which still seem to work. I added
alpha_0 as an argument to rDirichlet and included it in the beta
calculation. I also removed the alpha_0/k totalCount initialization.
This now corresponds, I think, to the R code above and degenerates to
the same initial beta arguments during initialization when totalCounts
are 0. Could you please look this over and see if you agree?
Thanks,
Jeff
/**
* Sample from a Dirichlet distribution, returning a vector of
probabilities using a
* stick-breaking algorithm
*
* @param totalCounts an unnormalized count Vector
* @param alpha_0 a double
* @return a Vector of probabilities
*/
public static Vector rDirichlet(Vector totalCounts, double alpha_0) {
Vector result = totalCounts.like();
double total = totalCounts.zSum();
double other = 1.0;
for (int i = 0; i < result.size(); i++) {
double count = totalCounts.get(i);
total -= count;
double beta = rBeta(1 + count, Math.max(0, alpha_0 + total));
double pi = beta * other;
result.set(i, pi);
other *= 1 - beta;
}
return result;
}