Nope. I couldnt even login to the master. It asked for accepting the ssh key. but threw message that channel timing out. After that i just left it and went with clouderas testing repo.
On Mon, Mar 1, 2010 at 11:20 AM, Sirota, Peter <sir...@amazon.com> wrote: > Hi Robin, > > Did you try to rerun this on EMR? > > Sent from my phone > > On Feb 28, 2010, at 9:38 PM, "Robin Anil" <robin.a...@gmail.com> wrote: > > >> > >> > > On cloudera 8 node c1.medium on 6 GB compressed(26GB uncompressed > > wikipeda) > > > >> 32 mappers 10 reducers(8 nodes, dunno why it is limited to 10) as > >> compared > > to 16 mappers and 8 reducers(4 nodes) > > > > org.apache.mahout.text.SparseVectorsFromSequenceFiles -i wikipedia/ -o > > wikipedia-unigram/ -a org.apache.mahout.analysis.WikipediaAnalyzer - > > chunk > > 512 -wt tfidf -md 3 -x 99 -ml 1 -ng 1 -w -s 3 > > > >> > > Dictionary size 78MB > > > >> > > tokenzing step 9 min > > > >> word count 8:30 min > > > >> 1pass tf vectorization 20 min > > > >> df counting 8 min > > > >> 1pass tfidf vectorization 9 min > > > >> > > total 57 min > > > > Thats linear scaling :) >