Re: Mahout with Storm/Spark
Hi Peyman, good to hear from u. Not sure if anyone's responded to u yet, but the answer to ur question is I am not aware of any bench marking that was done for #Mahout's CVB impl. Others please jump in here if you think otherwise. What has changed in LDA from 0.7 - 0.9? - 0.7 had LDA with Gibbs Sampling and LDA with CVB. - deprecated LDA with Gibbs sampling in 0.8 - purged LDA with Gibbs Sampling in 0.9 On Sunday, March 9, 2014 11:46 AM, Peyman Faratin peymanfara...@gmail.com wrote: Hi Is there any benchmarking to know the limits of the cvb (and what has changed in lda from 0.7-0.8-0.9 to solve the convergence speeds? I would like to use the cvb on 150k+ corpus but have come across a number of threads that mention the slow convergence speeds. Knowing what has changed in recent versions to address this issue would help decide whether to use Mahout or not (Y!LDA being the other option) thank you On Mar 7, 2014, at 12:36 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: a) Upgrade to the latest Mahout version, please move away from 0.7 a lot of lint was cleaned up since then. b) Seems like u r running the old LDA algorithm that was replaced by CVB in later versions, try running ur corpus thru CVB once you upgrade to a later version of Mahout. I don't think u need Storm/Spark for that. On Friday, March 7, 2014 12:21 PM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi Ted, It is Mahout 0.7. Thanks Vineet Yadav On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning ted.dunn...@gmail.com wrote: WHich version are you using? On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi, I am using Mahout LDA algorithm for Topic Modeling on a huge no of documents(500k or more). Mahout is taking a lot of time, I am looking at other alternatives. I found the link( http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where storm is used with Mallet for real time topic modeling. I want to know if anyone has tried storm or spark with mahout to speed up the process. Thanks Vineet Yadav
Re: Mahout with Storm/Spark
Hi Ted, It is Mahout 0.7. Thanks Vineet Yadav On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning ted.dunn...@gmail.com wrote: WHich version are you using? On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi, I am using Mahout LDA algorithm for Topic Modeling on a huge no of documents(500k or more). Mahout is taking a lot of time, I am looking at other alternatives. I found the link( http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where storm is used with Mallet for real time topic modeling. I want to know if anyone has tried storm or spark with mahout to speed up the process. Thanks Vineet Yadav
Re: Mahout with Storm/Spark
a) Upgrade to the latest Mahout version, please move away from 0.7 a lot of lint was cleaned up since then. b) Seems like u r running the old LDA algorithm that was replaced by CVB in later versions, try running ur corpus thru CVB once you upgrade to a later version of Mahout. I don't think u need Storm/Spark for that. On Friday, March 7, 2014 12:21 PM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi Ted, It is Mahout 0.7. Thanks Vineet Yadav On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning ted.dunn...@gmail.com wrote: WHich version are you using? On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi, I am using Mahout LDA algorithm for Topic Modeling on a huge no of documents(500k or more). Mahout is taking a lot of time, I am looking at other alternatives. I found the link( http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where storm is used with Mallet for real time topic modeling. I want to know if anyone has tried storm or spark with mahout to speed up the process. Thanks Vineet Yadav
Re: Mahout with Storm/Spark
WHich version are you using? On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.comwrote: Hi, I am using Mahout LDA algorithm for Topic Modeling on a huge no of documents(500k or more). Mahout is taking a lot of time, I am looking at other alternatives. I found the link( http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where storm is used with Mallet for real time topic modeling. I want to know if anyone has tried storm or spark with mahout to speed up the process. Thanks Vineet Yadav