topic modeling using LDA in MLLib

2015-03-18 Thread heszak
I'm coming from a Hadoop background but I'm totally new to Apache Spark. I'd like to do topic modeling using LDA algorithm on some txt files. The example on the Spark website assumes that the input to the LDA is a file containing the words counts. I wonder if someone could help me figuring out the

Does newly-released LDA (Latent Dirichlet Allocation) algorithm supports ngrams?

2015-03-18 Thread heszak
I wonder to know whether the newly-released LDA (Latent Dirichlet Allocation) algorithm only supports uni-gram or it can also supports bi/tri-grams too? If it can, can someone help me how I can use them? -- View this message in context: