Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
Hi, I'm new here so forgive my little experience with Mahout. We're trying to use Mahout (on our hadoop cluster) for calculating topics on almost 14000 documents. I've been following this wiki page (http://goo.gl/DcPVjB) but still getting errors. Here's what I'm doing: 1) creating sequence fi

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
matrix that needs to be fed as input to cvb. So the input to vb needs to be "jojoba/matrix/matrix". Give that a try and let us know. From: Marco To: "user@mahout.apache.org" Sent: Wednesday, July 31, 2013 4:20 AM Subject: Latent Dirichlet A

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
Da: Suneel Marthi A: "user@mahout.apache.org" ; Marco Inviato: Mercoledì 31 Luglio 2013 11:01 Oggetto: Re: Latent Dirichlet Allocatio (cvb) RowId job creates a matrix (IntWritable, VectorWritable) and a docIndex (IntWritable, Text). So you should be seeing 2 files generated -  jojoba/ma

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Jake Mannix
ic/term > distributions from jojoba/mt/model-2 to jojoba/to-output" run successfully > but if i now do a vectodump i always get a Java Heaps Space error > > > > > Da: Suneel Marthi > A: "user@mahout.apache.org" ; Marco < &g

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
PSIZE to 1m Da: Jake Mannix A: "user@mahout.apache.org" ; Marco Cc: Suneel Marthi Inviato: Mercoledì 31 Luglio 2013 16:34 Oggetto: Re: Latent Dirichlet Allocatio (cvb) If you're supplying a dictionary file (as you are), I'd suggest not specifying the &qu

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Jake Mannix
giving it? There may be a big here. Also, what version of Mahout are you using? > > > > > > Da: Jake Mannix > A: "user@mahout.apache.org" ; Marco < > zentrop...@yahoo.co.uk> > Cc: Suneel Marthi > Inviato: Mercoledì 31

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
Wednesday, July 31, 2013 10:51 AM Subject: Re: Latent Dirichlet Allocatio (cvb) On Wed, Jul 31, 2013 at 7:44 AM, Marco wrote: > ok. i'll re run it without that nt (which i supposed was NOT optional). > Well, it's not optional if you don't supply a dictionary (which i

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
; ; Marco Inviato: Mercoledì 31 Luglio 2013 16:51 Oggetto: Re: Latent Dirichlet Allocatio (cvb) On Wed, Jul 31, 2013 at 7:44 AM, Marco wrote: > ok. i'll re run it without that nt (which i supposed was NOT optional). > Well, it's not optional if you don't supply a dictionar

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
Please work off of Mahout 0.8, there are lot of fixes and improvements that went for CVB0 in this release. Correct me here Jake? From: Marco To: "user@mahout.apache.org" Sent: Wednesday, July 31, 2013 11:01 AM Subject: Re: Latent Dirichlet Allo

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
already looked there. no cvb examle or vectordump :( Da: Suneel Marthi A: "user@mahout.apache.org" ; Marco Inviato: Mercoledì 31 Luglio 2013 16:55 Oggetto: Re: Latent Dirichlet Allocatio (cvb) @Marco, look at examples/bin/cluster-reut

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Jake Mannix
gt; Ah, that's a vectordump bug in 0.7, fixed in 0.8, sorry about that. > > > > > > Da: Jake Mannix > A: "user@mahout.apache.org" ; Marco < > zentrop...@yahoo.co.uk> > Inviato: Mercoledì 31 Luglio 2013 16:51 > O

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
CVB was added to cluster_reuters.sh in 0.8, u wouldn't see it in 0.7. Suggest that you work off of 0.8. From: Marco To: "user@mahout.apache.org" ; Suneel Marthi Sent: Wednesday, July 31, 2013 11:05 AM Subject: Re: Latent Dirichlet

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
Mercoledì 31 Luglio 2013 17:07 Oggetto: Re: Latent Dirichlet Allocatio (cvb) CVB was added to cluster_reuters.sh in 0.8, u wouldn't see it in 0.7. Suggest that you work off of 0.8. From: Marco To: "user@mahout.apache.org" ; Suneel Marthi Sent

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Ted Dunning
On Wed, Jul 31, 2013 at 8:33 AM, Marco wrote: > will check out if cloudera supports mahout 0.8. > Don't worry about Cloudera support. Mahout support is better. :-)

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Sean Owen
FWIW I know Mahout 0.8 works fine with CDH4 (the "mr1" version of course) and is what CDH5 will include. Should be no problems there. On Wed, Jul 31, 2013 at 4:33 PM, Marco wrote: > great. at least i know what's wrong :) > > will check out if cloudera supports mahout 0.8. > > meanwhile we'll drop