I helped some with the LDA and worked quite a bit on a Gibbs version. I
don't know if the Gibbs version might help, but since it is not (yet) in
MLlib, Intel Analytics kindly created a spark package with their adapted
version plus a couple other LDA algorithms:
http://spark-packages.org/package/intel-analytics/TopicModeling
https://github.com/intel-analytics/TopicModeling

It might be worth trying out. Do you know what LDA algorithm VW uses?

Pedro


On Tue, Sep 22, 2015 at 1:54 AM, Marko Asplund <marko.aspl...@gmail.com>
wrote:

> Hi,
>
> I did some profiling for my LDA prototype code that requests topic
> distributions from a model.
> According to Java Mission Control more than 80 % of execution time during
> sample interval is spent in the following methods:
>
> org.apache.commons.math3.util.FastMath.log(double); count: 337; 47.07%
> org.apache.commons.math3.special.Gamma.digamma(double); count: 164; 22.91%
> org.apache.commons.math3.util.FastMath.log(double, double[]); count: 50;
> 6.98%
> java.lang.Double.valueOf(double); count: 31; 4.33%
>
> Is there any way of using the API more optimally?
> Are there any opportunities for optimising the "topicDistributions" code
> path in MLlib?
>
> My code looks like this:
>
> // executed once
> val model = LocalLDAModel.load(ctx, ModelFileName)
>
> // executed four times
> val samples = Transformers.toSparseVectors(vocabularySize,
> ctx.parallelize(Seq(input))) // fast
> model.topicDistributions(samples.zipWithIndex.map(_.swap)) // <== this
> seems to take about 4 seconds to execute
>
>
> marko
>



-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Reply via email to