Nope. I couldnt even login to the master. It asked for accepting the ssh
key. but threw message that channel timing out. After that i just left it
and went with clouderas testing repo.

On Mon, Mar 1, 2010 at 11:20 AM, Sirota, Peter <sir...@amazon.com> wrote:

> Hi Robin,
>
> Did you try to rerun this on EMR?
>
> Sent from my phone
>
> On Feb 28, 2010, at 9:38 PM, "Robin Anil" <robin.a...@gmail.com> wrote:
>
> >>
> >>
> > On cloudera 8 node c1.medium on 6 GB compressed(26GB uncompressed
> > wikipeda)
> >
> >> 32 mappers 10 reducers(8 nodes, dunno why it is limited to 10) as
> >> compared
> > to 16 mappers and 8 reducers(4 nodes)
> >
> > org.apache.mahout.text.SparseVectorsFromSequenceFiles -i wikipedia/ -o
> > wikipedia-unigram/ -a org.apache.mahout.analysis.WikipediaAnalyzer -
> > chunk
> > 512 -wt tfidf -md 3 -x 99 -ml 1 -ng 1 -w -s 3
> >
> >>
> > Dictionary size 78MB
> >
> >>
> > tokenzing step 9 min
> >
> >> word count       8:30 min
> >
> >> 1pass tf vectorization 20 min
> >
> >> df counting       8 min
> >
> >> 1pass tfidf vectorization 9 min
> >
> >>
> > total 57 min
> >
> > Thats linear scaling :)
>

Reply via email to