It's a good question and I had similar requirement in my work. I'm copying
the implementation from mllib to ml currently, and then exposing the
maximum log likelihood. I will send this PR soon.

Thanks.
Yanbo

On Fri, Oct 7, 2016 at 1:37 AM, 王磊(安全部) <wangleikidd...@didichuxing.com>
wrote:

>
> Hi,
>
> Do you guys sometimes need to get the log likelihood of EM algorithm in
> MLLIB?
>
> I mean the value in this line https://github.com/apache/spark/blob/master/
> mllib/src/main/scala/org/apache/spark/mllib/clustering/
> GaussianMixture.scala#L228
>
> Now copying the code here:
>
>
> val sums = breezeData.treeAggregate(ExpectationSum.zero(k,
> d))(compute.value, _ += _)
> // Create new distributions based on the partial assignments
> // (often referred to as the "M" step in literature)
> val sumWeights = sums.weights.sum
> if (shouldDistributeGaussians) {
> val numPartitions = math.min(k, 1024)
> val tuples =
> Seq.tabulate(k)(i => (sums.means(i), sums.sigmas(i), sums.weights(i)))
> val (ws, gs) = sc.parallelize(tuples, numPartitions).map { case (mean,
> sigma, weight) =>
> updateWeightsAndGaussians(mean, sigma, weight, sumWeights)
> }.collect().unzip
> Array.copy(ws.toArray, 0, weights, 0, ws.length)
> Array.copy(gs.toArray, 0, gaussians, 0, gs.length)
> } else {
> var i = 0
> while (i < k) {
> val (weight, gaussian) =
> updateWeightsAndGaussians(sums.means(i), sums.sigmas(i), sums.weights(i),
> sumWeights)
> weights(i) = weight
> gaussians(i) = gaussian
> i = i + 1
> }
> }
> llhp = llh // current becomes previous
> llh = sums.logLikelihood // this is the freshly computed log-likelihood
> iter += 1
> compute.destroy(blocking = false) In my application, I need to know log
> likelihood to compare effect for different number of clusters.
> And then I use the cluster number with the maximum log likelihood.
>
> Is it a good idea to expose this value?
>
>
>
>

Reply via email to