Re: Mahout with Storm/Spark

2014-03-16 Thread Suneel Marthi
Hi Peyman, 


good to hear from u.  Not sure if anyone's responded to u yet, but the answer 
to ur question is I am not aware of any bench marking that was done for 
#Mahout's CVB impl. Others please jump in here if you think otherwise.

What has changed in LDA from 0.7 - 0.9?  


 - 0.7 had LDA with Gibbs Sampling and LDA with CVB. 

- deprecated LDA with Gibbs sampling in 0.8
- purged LDA with Gibbs Sampling in 0.9




On Sunday, March 9, 2014 11:46 AM, Peyman Faratin peymanfara...@gmail.com 
wrote:
 
Hi 

Is there any benchmarking to know the limits of the cvb (and what has changed 
in lda from 0.7-0.8-0.9 to solve the convergence speeds? I would like to use 
the cvb on 150k+ corpus but have come across a number of threads that mention 
the slow convergence speeds. Knowing what has changed in recent versions to 
address this issue would help decide whether to use Mahout or not (Y!LDA being 
the other option)

thank you


On Mar 7, 2014, at 12:36 PM, Suneel Marthi suneel_mar...@yahoo.com wrote:

 a) Upgrade to the latest Mahout version, please move away from 0.7 a lot of 
 lint was cleaned up since then.  
 
 b) Seems like u r running the old LDA algorithm that was replaced by CVB in 
 later versions,  try running ur corpus thru CVB once you upgrade to a later 
 version of Mahout. I don't think u need Storm/Spark for that.
 
 
 
 
 
 
 
 On Friday, March 7, 2014 12:21 PM, vineet yadav vineet.yadav.i...@gmail.com 
 wrote:
 
 Hi Ted,
 It is Mahout 0.7.
 
 Thanks
 Vineet Yadav
 
 
 On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 WHich version are you using?
 
 
 On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.com
 wrote:
 
 Hi,
 I am using Mahout LDA algorithm for Topic Modeling on a huge no of
 documents(500k or more). Mahout is taking a lot of time, I am looking at
 other alternatives. I found the link(
 http://www.oracle.com/technetwork/articles/java/micro-1925135.html),
 where
 storm is used with Mallet for real time topic modeling. I want to know if
 anyone has tried storm or spark with mahout to speed up the process.
 
 Thanks
 Vineet Yadav
 

Re: Mahout with Storm/Spark

2014-03-07 Thread vineet yadav
Hi Ted,
It is Mahout 0.7.

Thanks
Vineet Yadav


On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 WHich version are you using?


 On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.com
 wrote:

  Hi,
  I am using Mahout LDA algorithm for Topic Modeling on a huge no of
  documents(500k or more). Mahout is taking a lot of time, I am looking at
  other alternatives. I found the link(
  http://www.oracle.com/technetwork/articles/java/micro-1925135.html),
 where
  storm is used with Mallet for real time topic modeling. I want to know if
  anyone has tried storm or spark with mahout to speed up the process.
 
  Thanks
  Vineet Yadav
 



Re: Mahout with Storm/Spark

2014-03-07 Thread Suneel Marthi
a) Upgrade to the latest Mahout version, please move away from 0.7 a lot of 
lint was cleaned up since then.  

b) Seems like u r running the old LDA algorithm that was replaced by CVB in 
later versions,  try running ur corpus thru CVB once you upgrade to a later 
version of Mahout. I don't think u need Storm/Spark for that.







On Friday, March 7, 2014 12:21 PM, vineet yadav vineet.yadav.i...@gmail.com 
wrote:
 
Hi Ted,
It is Mahout 0.7.

Thanks
Vineet Yadav


On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 WHich version are you using?


 On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.com
 wrote:

  Hi,
  I am using Mahout LDA algorithm for Topic Modeling on a huge no of
  documents(500k or more). Mahout is taking a lot of time, I am looking at
  other alternatives. I found the link(
  http://www.oracle.com/technetwork/articles/java/micro-1925135.html),
 where
  storm is used with Mallet for real time topic modeling. I want to know if
  anyone has tried storm or spark with mahout to speed up the process.
 
  Thanks
  Vineet Yadav
 


Re: Mahout with Storm/Spark

2014-03-06 Thread Ted Dunning
WHich version are you using?


On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.comwrote:

 Hi,
 I am using Mahout LDA algorithm for Topic Modeling on a huge no of
 documents(500k or more). Mahout is taking a lot of time, I am looking at
 other alternatives. I found the link(
 http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where
 storm is used with Mallet for real time topic modeling. I want to know if
 anyone has tried storm or spark with mahout to speed up the process.

 Thanks
 Vineet Yadav