Re: Cosine Similarity and LogLikelihood not helpful for implicit feedback!
Here is a paper that includes an analysis of voting patterns using LDA. http://arxiv.org/pdf/math/0604410.pdf On Tue, Sep 30, 2014 at 7:04 PM, Parimi Rohit rohit.par...@gmail.com wrote: Ted, I know LDA can be used to model text data but never used it in this setting. Can you please give me some pointers on how I can apply it in this setting? Thanks, Rohit On Tue, Sep 30, 2014 at 4:33 PM, Ted Dunning ted.dunn...@gmail.com wrote: This is an incredibly tiny dataset. If you delete singletons, it is likely to get significantly smaller. I think that something like LDA might work much better for you. It was designed to work on small data like this. On Tue, Sep 30, 2014 at 11:13 AM, Parimi Rohit rohit.par...@gmail.com wrote: Ted, Thanks for your response. Following is the information about the approach and the datasets: I am using the ItemSimilarityJob and passing it itemID, userID, prefCount tuples as input to compute user-user similarity using LLR. I read this approach from a response for one of the stackoverflow questions on calculating user similarity using mahout. . Following are the stats for the datasets: Coauthor dataset: users = 29189 items = 140091 averageItemsClicked = 15.808660796875536 Conference Dataset: users = 29189 items = 2393 averageItemsClicked = 7.265099866388023 Reference Dataset: users = 29189 items = 201570 averageItemsClicked = 61.08564870327863 By Scale, did you mean rating scale? If so, I am using preference counts, not rating. Thanks, Rohit On Tue, Sep 30, 2014 at 12:08 AM, Ted Dunning ted.dunn...@gmail.com wrote: How are you using LLR to compute user similarity? It is normally used to compute item similarity? Also, what is your scale? how many users? how many items? how many actions per user? On Mon, Sep 29, 2014 at 6:24 PM, Parimi Rohit rohit.par...@gmail.com wrote: Hi, I am exploring a random-walk based algorithm for recommender systems which works by propagating the item preferences for users on the user-user graph. To do this, I have to compute user-user similarity and form a neighborhood. I have tried the following three simple techniques to compute the score between two users and find the neighborhood. 1. Score = (Common Items between users A and B) / (items preferred by A + items Preferred by B) 2. Scoring based on Mahout's Cosine Similarity 3. Scoring based on Mahout's LogLikelihood similarity. My understanding is that similarity based on LogLikelihood is more robust, however, I get better results using the naive approach (technique 1 from the above list). The problems I am addressing are collaborator recommendation, conference recommendation and reference recommendation and the data has implicit feedback. So, my questions is, are there any cases where cosine similarity and loglikelihood metrics fail (to capture similarity), for example, for the problems stated above, users only collaborate with few other users (based on area of interest), publish in only few conferences (again based on area of interest) and refer to publications in a specific domain. So, the preference counts are fairly small compared to other domains (music/video etc). Secondly, for CosineSimilarity, should I treat the preferences as boolean or use the counts? (I think loglikelihood metric does not take into account the preference counts.. correct me if I am wrong.) Any insight into this is much appreciated. Thanks, Rohit p.s. Ted, Pat: I am following the discussion on the thread LogLikelihoodSimilarity Calculation and your answers helped me a lot to understand how it works and made me wonder why things are different in my case.
Re: word weights using BM25
Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface org.apache.mahout.vectorizer.Weight, create a BM25Converter and BM25PartialVectorReducer similar to TFIDFConverter https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html and TFIDFPartialVectorReducer https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali ar...@arianpasquali.com: Yes, I'm studying his work http://nlp.uned.es/~jperezi/Lucene-BM25/ and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:00 Suneel Marthi suneel.mar...@gmail.com: Lucene 4.x supports okapi-bm25. So it should be easy to implement. On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali http://about.me/arianpasquali
Re: word weights using BM25
How did u implement BM25PartialVectorReducer and BM25Converter?? The present implementations for TFIDFConverter and Reducer are MR. Mahout is not accepting any new MapReduce code. On Wed, Oct 1, 2014 at 7:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface org.apache.mahout.vectorizer.Weight, create a BM25Converter and BM25PartialVectorReducer similar to TFIDFConverter https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html and TFIDFPartialVectorReducer https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali ar...@arianpasquali.com: Yes, I'm studying his work http://nlp.uned.es/~jperezi/Lucene-BM25/ and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:00 Suneel Marthi suneel.mar...@gmail.com: Lucene 4.x supports okapi-bm25. So it should be easy to implement. On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali http://about.me/arianpasquali
Re: word weights using BM25
Thanks so much for the feedback. Glad to hear it was straightforward. But the important question is how did BM25 work for you? On Wed, Oct 1, 2014 at 6:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface org.apache.mahout.vectorizer.Weight, create a BM25Converter and BM25PartialVectorReducer similar to TFIDFConverter https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html and TFIDFPartialVectorReducer https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali ar...@arianpasquali.com: Yes, I'm studying his work http://nlp.uned.es/~jperezi/Lucene-BM25/ and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:00 Suneel Marthi suneel.mar...@gmail.com: Lucene 4.x supports okapi-bm25. So it should be easy to implement. On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali http://about.me/arianpasquali
Re: word weights using BM25
Hi Ted, My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:09 GMT+01:00 Ted Dunning ted.dunn...@gmail.com: Thanks so much for the feedback. Glad to hear it was straightforward. But the important question is how did BM25 work for you? On Wed, Oct 1, 2014 at 6:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface org.apache.mahout.vectorizer.Weight, create a BM25Converter and BM25PartialVectorReducer similar to TFIDFConverter https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html and TFIDFPartialVectorReducer https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali ar...@arianpasquali.com: Yes, I'm studying his work http://nlp.uned.es/~jperezi/Lucene-BM25/ and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:00 Suneel Marthi suneel.mar...@gmail.com: Lucene 4.x supports okapi-bm25. So it should be easy to implement. On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali http://about.me/arianpasquali
Re: Cosine Similarity and LogLikelihood not helpful for implicit feedback!
Thanks Ted! Will look into it. Rohit On Wed, Oct 1, 2014 at 1:04 AM, Ted Dunning ted.dunn...@gmail.com wrote: Here is a paper that includes an analysis of voting patterns using LDA. http://arxiv.org/pdf/math/0604410.pdf On Tue, Sep 30, 2014 at 7:04 PM, Parimi Rohit rohit.par...@gmail.com wrote: Ted, I know LDA can be used to model text data but never used it in this setting. Can you please give me some pointers on how I can apply it in this setting? Thanks, Rohit On Tue, Sep 30, 2014 at 4:33 PM, Ted Dunning ted.dunn...@gmail.com wrote: This is an incredibly tiny dataset. If you delete singletons, it is likely to get significantly smaller. I think that something like LDA might work much better for you. It was designed to work on small data like this. On Tue, Sep 30, 2014 at 11:13 AM, Parimi Rohit rohit.par...@gmail.com wrote: Ted, Thanks for your response. Following is the information about the approach and the datasets: I am using the ItemSimilarityJob and passing it itemID, userID, prefCount tuples as input to compute user-user similarity using LLR. I read this approach from a response for one of the stackoverflow questions on calculating user similarity using mahout. . Following are the stats for the datasets: Coauthor dataset: users = 29189 items = 140091 averageItemsClicked = 15.808660796875536 Conference Dataset: users = 29189 items = 2393 averageItemsClicked = 7.265099866388023 Reference Dataset: users = 29189 items = 201570 averageItemsClicked = 61.08564870327863 By Scale, did you mean rating scale? If so, I am using preference counts, not rating. Thanks, Rohit On Tue, Sep 30, 2014 at 12:08 AM, Ted Dunning ted.dunn...@gmail.com wrote: How are you using LLR to compute user similarity? It is normally used to compute item similarity? Also, what is your scale? how many users? how many items? how many actions per user? On Mon, Sep 29, 2014 at 6:24 PM, Parimi Rohit rohit.par...@gmail.com wrote: Hi, I am exploring a random-walk based algorithm for recommender systems which works by propagating the item preferences for users on the user-user graph. To do this, I have to compute user-user similarity and form a neighborhood. I have tried the following three simple techniques to compute the score between two users and find the neighborhood. 1. Score = (Common Items between users A and B) / (items preferred by A + items Preferred by B) 2. Scoring based on Mahout's Cosine Similarity 3. Scoring based on Mahout's LogLikelihood similarity. My understanding is that similarity based on LogLikelihood is more robust, however, I get better results using the naive approach (technique 1 from the above list). The problems I am addressing are collaborator recommendation, conference recommendation and reference recommendation and the data has implicit feedback. So, my questions is, are there any cases where cosine similarity and loglikelihood metrics fail (to capture similarity), for example, for the problems stated above, users only collaborate with few other users (based on area of interest), publish in only few conferences (again based on area of interest) and refer to publications in a specific domain. So, the preference counts are fairly small compared to other domains (music/video etc). Secondly, for CosineSimilarity, should I treat the preferences as boolean or use the counts? (I think loglikelihood metric does not take into account the preference counts.. correct me if I am wrong.) Any insight into this is much appreciated. Thanks, Rohit p.s. Ted, Pat: I am following the discussion on the thread LogLikelihoodSimilarity Calculation and your answers helped me a lot to understand how it works and made me wonder why things are different in my case.
Re: word weights using BM25
Yes Suneel, Indeed It is in MR fashion. What exactly do you mean when you said Mahout is not accepting any new MapReduce code? Do you mean for submitting a patch? I'm sure there might be better ways to implement it, but I'm more interesting in the results right now. What would be your suggestion? best Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:10 GMT+01:00 Suneel Marthi smar...@apache.org: How did u implement BM25PartialVectorReducer and BM25Converter?? The present implementations for TFIDFConverter and Reducer are MR. Mahout is not accepting any new MapReduce code. On Wed, Oct 1, 2014 at 7:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface org.apache.mahout.vectorizer.Weight, create a BM25Converter and BM25PartialVectorReducer similar to TFIDFConverter https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html and TFIDFPartialVectorReducer https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali ar...@arianpasquali.com: Yes, I'm studying his work http://nlp.uned.es/~jperezi/Lucene-BM25/ and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:00 Suneel Marthi suneel.mar...@gmail.com: Lucene 4.x supports okapi-bm25. So it should be easy to implement. On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali http://about.me/arianpasquali
Re: word weights using BM25
On Wed, Oct 1, 2014 at 7:52 AM, Arian Pasquali ar...@arianpasquali.com wrote: My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. OK. I will take note.
Re: how to get recommendations by using user-user correlation for the given table in this mail
First I agree with Ted that LLR is better. I've tried all of the similarity methods in Mahout on exactly the same dataset and got far higher cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs. If you have data in tables you need to create single interactions. These will look like: user1,vendor1,rating userN,vendorM,rating ... If you are recommending vendors (not specific services of specific vendors) you need to map your IDs into IDs that the recommender can ingest. You can’t tell which of the separate ratings will be used if the same user rated multiple services of the same vendor so you should determine which rating you want to use as input. You need to translate your IDs into Mahout IDs. Let’s say you go through all of your vendors, assign the first one a Mahout ID of integer = 0, then the next unique vendor you see will get Mahout ID = 1 and so on. You need to do this for your Items (vendors) as well. So your input to Mahout will look something like this: Formatted as Mahout User ID, Mahout Item ID, rating your input files will contain: 0,0,1 0,2000,3 0,4,5 1,3,1 1000,2000,5 … Then after you run the Mahout Item-based recommender you will get back a list of recommendations for each user. The key will be an integer equal to the Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You will need to map the Mahout IDs back into your application ids. Since you are recommending vendors the vendors are items so map all Mahout Item IDs into your vendor ids and the Mahout User IDs into your user ids. On Sep 30, 2014, at 6:55 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Thank you @Ted, but my guide is suggesting to go with what Pat is suggesting. @Pat could you plz tell, if I want to recommend vendors to the user from the table how they should be grouped and you mentioned *your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers* i don't know about translation could you plz tell more about the translation. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But I strongly suggest that you not use Pearson Correlation. Use the LLR similarity to compute indicator actions for each vendor. Then use a user's history of actions to score vendors. This is not only much simpler than what you are asking for, it will be more accurate. You should also measure additional actions besides ratings. On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: @Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add with a recommendation engine. At the same time most people are busy not giving you ratings, they are doing lots of other things that tell you what they are thinking and reacting to. If you don't pay attention to that additional information, you are handicapping yourself severely. On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B
Re: how to get recommendations by using user-user correlation for the given table in this mail
Hi Pat, If I am wrong plz correct me, if we take table 2 (user2) then he rated for vendor 1 - vendor 3, 1. I am going assign for each user an ID starting from 1 - N. 2. Vendors will have the ID with 601,602,603 3. Services will have the ID with 501,502,503. 4. If I translate the Vendor and Service IDs it looks like 601501,601502,601503.. 5. The input to the Mahout will be for USER ID, COMBINED ID, RATING 6. output form the Mahout will be COMBINED IDs, for the user and again I have to separate the COMBINED ID into Vendor ID and Service ID. Is this the correct flow ? Thanks and Regards, Vinayak B On Thu, Oct 2, 2014 at 12:23 AM, Pat Ferrel p...@occamsmachete.com wrote: First I agree with Ted that LLR is better. I've tried all of the similarity methods in Mahout on exactly the same dataset and got far higher cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs. If you have data in tables you need to create single interactions. These will look like: user1,vendor1,rating userN,vendorM,rating ... If you are recommending vendors (not specific services of specific vendors) you need to map your IDs into IDs that the recommender can ingest. You can’t tell which of the separate ratings will be used if the same user rated multiple services of the same vendor so you should determine which rating you want to use as input. You need to translate your IDs into Mahout IDs. Let’s say you go through all of your vendors, assign the first one a Mahout ID of integer = 0, then the next unique vendor you see will get Mahout ID = 1 and so on. You need to do this for your Items (vendors) as well. So your input to Mahout will look something like this: Formatted as Mahout User ID, Mahout Item ID, rating your input files will contain: 0,0,1 0,2000,3 0,4,5 1,3,1 1000,2000,5 … Then after you run the Mahout Item-based recommender you will get back a list of recommendations for each user. The key will be an integer equal to the Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You will need to map the Mahout IDs back into your application ids. Since you are recommending vendors the vendors are items so map all Mahout Item IDs into your vendor ids and the Mahout User IDs into your user ids. On Sep 30, 2014, at 6:55 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Thank you @Ted, but my guide is suggesting to go with what Pat is suggesting. @Pat could you plz tell, if I want to recommend vendors to the user from the table how they should be grouped and you mentioned *your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers* i don't know about translation could you plz tell more about the translation. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But I strongly suggest that you not use Pearson Correlation. Use the LLR similarity to compute indicator actions for each vendor. Then use a user's history of actions to score vendors. This is not only much simpler than what you are asking for, it will be more accurate. You should also measure additional actions besides ratings. On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: @Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add