RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
Hey Sean, Thanks for response. MatrixMultiplicationJob help shows the usage like : usage: [Generic Options] [Job-Specific Options] Here Generic Option can be provided by -D . Hence I tried with commandline -D options but it seems like that it is not making any effect. It is also suggested in

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
Hi, I tried to call programmatically also but facing same issue : Only single MapTask is running and that too spilling the map output continuously. Hence im not able to generate the output for large matrix multiplication. Code Snippet : DistributedRowMatrix a = new DistributedRowMatrix(new Pa

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Sean Owen
Why do you need multiple mappers? Is one too slow? Many are not necessarily faster for small input On Jan 16, 2013 10:46 AM, "Stuti Awasthi" wrote: > Hi, > I tried to call programmatically also but facing same issue : Only single > MapTask is running and that too spilling the map output continuo

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
The issue is that currently my matrix is of dimension (100x100k), Later it can be (1MX10M) or big. Even now if my job is running with the single mapper for (100x100k) and it is not able to complete the Job. As I mentioned map task just proceed to 0.99% and started spilling the map output. Henc

Re: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Ashish
MatrixMultiplicationJob internally sets InputFormat as CompositeInputFormat JobConf conf = new JobConf(initialConf, MatrixMultiplicationJob.class); conf.setInputFormat(CompositeInputFormat.class); and AFAIK, CompositeInputFormat ignores the splits. See this http://stackoverflow.com/questions/8654

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
Thanks Ashish, So according to the link if one is using CompositeInputFormat then it will take entire file as Input to a mapper without considering InputSplits/blocksize. If I am understanding it correctly then it is asking to break [Original Input File]->[flie1,file2,] . So If my file is

Re: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Ashish
I am afraid I don't know the answer. Need to experiment a bit more. I have not used CompositeInputFormat so cannot comment. Probably, someone else on the ML(Mailing List) would be able to guide here. On Wed, Jan 16, 2013 at 6:01 PM, Stuti Awasthi wrote: > Thanks Ashish, > > So according to the

Re: Test multiple similarities using the same data

2013-01-16 Thread Sean Owen
You can try resetting all the random seeds with RandomUtils.useTestSeed() On Jan 16, 2013 4:01 PM, "Zia mel" wrote: > Hi > > How to evaluate a recommender using different similarities ? Once we call > evaluator.evaluate(recommenderBuilder,..) > it will decide the training and test data for that r

Recommend to a group of users

2013-01-16 Thread Zia mel
Hi Can we use Mahout to recommend to a group of users that share similar interests? Maybe some clustering or so. Thanks

Re: Recommend to a group of users

2013-01-16 Thread Sean Owen
Not really directly, no. You can make N individual recommendations and combine them, and there are many ways to do that. You can blindly rank them on their absolute scores. You can interleave rankings so each gets every Nth slot in the recommendation. A popular metric is to rank by least-aversion -

Re: Which ML Algorithms i can run without hadoop..

2013-01-16 Thread Ted Dunning
Logistic regression is a good place to start. The Mahout implementation stands alone without Hadoop. Look for OnlineLogisticRegression. On Mon, Jan 14, 2013 at 10:23 PM, VIGNESH S wrote: > Hi, > > I am looking for a light weight library for email classification.. > > can anyone help me like wh

Re: Which ML Algorithms i can run without hadoop..

2013-01-16 Thread VIGNESH S
Hi Ted, Thanks .. On Thu, Jan 17, 2013 at 12:41 AM, Ted Dunning wrote: > Logistic regression is a good place to start. > > The Mahout implementation stands alone without Hadoop. Look for > OnlineLogisticRegression. > > On Mon, Jan 14, 2013 at 10:23 PM, VIGNESH S wrote: > >> Hi, >> >> I am look