Aurora did you see my last reply on the list?
On Wed, Jul 22, 2009 at 9:29 AM, Sean Owen<[email protected]> wrote: > Yes, there are a few components here -- a few different purposes. All > build around the core library which isn't specific to Hadoop or an > HTTP server, but you've seen some of the components that adapt the > core to this contexts. There are also components that can evaluate or > load test the code. > > The only piece you are interested in then is really the Hadoop > integration -- see org.apache.mahout.cf.taste.hadoop. There you will > find RecommenderJob which should be able to launch a > pseudo-distributed recommender job. I say pseudo since these > algorithms are not in general distributable, but, one can of course > run n instances of a recommender to compute 1/nth of all > recommendations each. That is nice, though it means, say, the amount > of RAM the jobs consume is still limited by the size of each machine. > > I just recently rewrote this package to be compatible with Hadoop > 0.20's new APIs. I do not know that it works, and, have some reason to > believe there are bugs in the API that will prevent it from working. > So this piece is currently in flux. > > If you want to experiment and be a guinea pig for this latest > revision, I can provide close support to work through the bugs on both > sides. Or we can talk about your requirements more a bit to figure out > whether this is feasible, what the best algorithm is, whether you need > Hadoop? > > How big is 'massive'? could you reveal how many users, items, and > user-item preferences to an order of magnitude? what is generally the > nature of the input data you have, and you want recommendations out? > > On Wed, Jul 22, 2009 at 12:12 AM, Aurora > Skarra-Gallagher<[email protected]> wrote: >> Hi, >> >> I apologize if I've misunderstood the purpose of the Taste component of >> Mahout. Our goal was to take a recommendation framework and use our own >> recommendation algorithm within it. We need to process a massive amount of >> data, and wanted it to be done on our Hadoop grid. I thought that Taste was >> the right fit for the job. I'm not interested in the HTTP service. I'm >> interested in the recommendation framework, particularly from a back-end >> batch perspective. Does that help clarify? Thanks for helping me sort >> through this. >> >> -Aurora >> >> >> On 7/21/09 3:02 PM, "Sean Owen" <[email protected]> wrote: >> >> Hmm, lots going on here, it's confusing. >> >> Are you trying to run this on Hadoop intentionally? because the web >> app example is not intended to run on Hadoop. It's a component >> intended to serve recommendations over HTTP in real time. It also >> appears you are running an evaluation rather than a web app serving >> requests. I realize you're trying to run this without Jetty, but >> that's kind of like trying to run a web app without a web server. >> >> I think you'd have to clarify what you are trying to do, and then what >> you are doing right now, to begin to assist. >> >> On Tue, Jul 21, 2009 at 9:20 PM, Aurora >> Skarra-Gallagher<[email protected]> wrote: >>> Hi, >>> >>> I'm trying to run the taste web example without using jetty. Our gateways >>> aren't meant to be used as webservers. By poking around, I found that the >>> following command worked: >>> hadoop --config ~/hod-clusters/test jar >>> /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job >>> org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner >>> >>> The output is: >>> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file >>> /tmp/ratings.txt >>> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: >>> Beginning evaluation using 0.9 of GroupLensDataModel >>> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info... >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines >>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines >>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines >>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines >>> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209 >>> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs... >>> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: >>> Evaluation result: 0.7035965559003973 >>> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: >>> 0.7035965559003973 >>> >>> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm >>> not sure if this is the correct way to run this example. I have a few >>> questions: >>> >>> 1. Is the output file /tmp/ratings.txt? If so, how do I interpret it? >>> 2. What does the Evaluation result mean? >>> 3. Is it even running on HDFS? >>> 4. Is it a map-reduce job? >>> >>> Any pointers on how to run this as a standalone job would be helpful. >>> >>> Thanks, >>> Aurora >>> >> >> >
