topic modeling using LDA in MLLib

2015-03-18 Thread heszak
I'm coming from a Hadoop background but I'm totally new to Apache Spark. I'd
like to do topic modeling using LDA algorithm on some txt files. The example
on the Spark website assumes that the input to the LDA is a file containing
the words counts. I wonder if someone could help me figuring out the steps
to start from actual txt documents (actual content) and come up with the
actual topics.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/topic-modeling-using-LDA-in-MLLib-tp22128.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Does newly-released LDA (Latent Dirichlet Allocation) algorithm supports ngrams?

2015-03-18 Thread heszak
I wonder to know whether the newly-released LDA (Latent Dirichlet Allocation)
algorithm only supports uni-gram or it can also supports bi/tri-grams too?
If it can, can someone help me how I can use them?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-newly-released-LDA-Latent-Dirichlet-Allocation-algorithm-supports-ngrams-tp22131.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org