Re: Some data sets with class imbalance

2012-03-09 Thread Nick Pentreath
For binary classification, any click-through data (like online ad click-through data) is extremely unbalanced. Of the order of 0.5% positive examples. Yahoo has some large data sets of this nature, that can be downloaded free for research purposes from Yahoo Research (I think it's

Re: How/where to run DisplayKMeans example

2012-03-09 Thread Sean Owen
This means you are running on a headless machine without a monitor. The program needs to show a window with graphics but cant. On Mar 9, 2012 6:48 AM, rahul raghavendhra rahulraghavendh...@gmail.com wrote: hi Lance, i tried as u said, but now i got a new exception Exception in thread main

Re: R: Using recommenders with String identifiers

2012-03-09 Thread Sean Owen
In this case, the code in question is the non-distributed code rather than Hadoop. But yes I agree it will make a perhaps bigger difference on Hadoop. All of the Hadoop stuff uses integer keys. On Fri, Mar 9, 2012 at 2:10 AM, Paritosh Ranjan pran...@xebia.com wrote: Are these identifiers used as

Re: How to find the k most similar docs

2012-03-09 Thread Suneel Marthi
Pat, MatrixDump expects an input file of  Text, MatrixWritable .  The matrix that gets created from RowIdJob is IntWritable, VectorWritable and you cannot run MatrixDump to see the contents of the matrix.  You need to use seqdumper as you had done. From:

Re: How to find the k most similar docs

2012-03-09 Thread Pat Ferrel
I assume that the other matrix operations will consume and produce Text, MatrixWritable? If so how do you create Text, MatrixWritable from the output of rowid IntWritable, VectorWritable? Also while we are at it how do you use vectordump? If you do bin/mahout vectordump --help you get some

Re: can I visualize clusters using the mahout binary?

2012-03-09 Thread Dmitriy Lyubimov
I plan to enable some degree if a mixed environment between R and Mahout but it will probably take several months before i will get meaningful coverage of stuff Mahout produces. On Wed, Feb 1, 2012 at 7:36 PM, Daniel Quach danqu...@cs.ucla.edu wrote: I just ran k-means over a set of data and I

Re: How to find the k most similar docs

2012-03-09 Thread Lance Norskog
No, the matrix multiplication operations all (probably) take int,vector where int is the row number. There has to be a universally unique row number. If there is no row number associated with a row in a distributed matrix op, how can the reducers know which rows they have? Rows do not necessarily

How tto integrate RecommenderJob's Recommendation in web application??

2012-03-09 Thread manish dunani
hi , I am already run RecommenderJob on hadoop cluster. output of RecommenderJob is in hdfs is like==user_id [item_id:score].I was get it in file. But,not getting idea about how to integrate recommendations like this in web application ==user_id item_id can any one have an idea about it??