Re: how to get recommendations by using user-user correlation for the given table in this mail
No. You seem to be describing combined vendor+service recommendations. So you will be creating input of user ID, combined ID, rating The way you are creating a combined ID is fine but it must still be mapped to a Mahout ID. The user and combined IDs must _each_ be mapped to 0-N. Think of Mahout IDs as row and column numbers in one big input table row # = Mahout user ID, column # = Mahout item ID. Mapping to and from these IDs is your task. user ID — 0..n combined ID — 0..m Then you will have input that Mahout can ingest. For example 0,0,3 1,0,2 … The calculated recommendations will use the Mahout IDs so you must map them back into yours. On Oct 1, 2014, at 5:58 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi Pat, If I am wrong plz correct me, if we take table 2 (user2) then he rated for vendor 1 - vendor 3, 1. I am going assign for each user an ID starting from 1 - N. 2. Vendors will have the ID with 601,602,603 3. Services will have the ID with 501,502,503. 4. If I translate the Vendor and Service IDs it looks like 601501,601502,601503.. 5. The input to the Mahout will be for USER ID, COMBINED ID, RATING 6. output form the Mahout will be COMBINED IDs, for the user and again I have to separate the COMBINED ID into Vendor ID and Service ID. Is this the correct flow ? Thanks and Regards, Vinayak B On Thu, Oct 2, 2014 at 12:23 AM, Pat Ferrel p...@occamsmachete.com wrote: First I agree with Ted that LLR is better. I've tried all of the similarity methods in Mahout on exactly the same dataset and got far higher cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs. If you have data in tables you need to create single interactions. These will look like: user1,vendor1,rating userN,vendorM,rating ... If you are recommending vendors (not specific services of specific vendors) you need to map your IDs into IDs that the recommender can ingest. You can’t tell which of the separate ratings will be used if the same user rated multiple services of the same vendor so you should determine which rating you want to use as input. You need to translate your IDs into Mahout IDs. Let’s say you go through all of your vendors, assign the first one a Mahout ID of integer = 0, then the next unique vendor you see will get Mahout ID = 1 and so on. You need to do this for your Items (vendors) as well. So your input to Mahout will look something like this: Formatted as Mahout User ID, Mahout Item ID, rating your input files will contain: 0,0,1 0,2000,3 0,4,5 1,3,1 1000,2000,5 … Then after you run the Mahout Item-based recommender you will get back a list of recommendations for each user. The key will be an integer equal to the Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You will need to map the Mahout IDs back into your application ids. Since you are recommending vendors the vendors are items so map all Mahout Item IDs into your vendor ids and the Mahout User IDs into your user ids. On Sep 30, 2014, at 6:55 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Thank you @Ted, but my guide is suggesting to go with what Pat is suggesting. @Pat could you plz tell, if I want to recommend vendors to the user from the table how they should be grouped and you mentioned *your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers* i don't know about translation could you plz tell more about the translation. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But I strongly suggest that you not use Pearson Correlation. Use the LLR similarity to compute indicator actions for each vendor. Then use a user's history of actions to score vendors. This is not only much simpler than what you are asking for, it will be more accurate. You should also measure additional actions besides ratings. On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: @Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards,
Re: how to get recommendations by using user-user correlation for the given table in this mail
First I agree with Ted that LLR is better. I've tried all of the similarity methods in Mahout on exactly the same dataset and got far higher cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs. If you have data in tables you need to create single interactions. These will look like: user1,vendor1,rating userN,vendorM,rating ... If you are recommending vendors (not specific services of specific vendors) you need to map your IDs into IDs that the recommender can ingest. You can’t tell which of the separate ratings will be used if the same user rated multiple services of the same vendor so you should determine which rating you want to use as input. You need to translate your IDs into Mahout IDs. Let’s say you go through all of your vendors, assign the first one a Mahout ID of integer = 0, then the next unique vendor you see will get Mahout ID = 1 and so on. You need to do this for your Items (vendors) as well. So your input to Mahout will look something like this: Formatted as Mahout User ID, Mahout Item ID, rating your input files will contain: 0,0,1 0,2000,3 0,4,5 1,3,1 1000,2000,5 … Then after you run the Mahout Item-based recommender you will get back a list of recommendations for each user. The key will be an integer equal to the Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You will need to map the Mahout IDs back into your application ids. Since you are recommending vendors the vendors are items so map all Mahout Item IDs into your vendor ids and the Mahout User IDs into your user ids. On Sep 30, 2014, at 6:55 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Thank you @Ted, but my guide is suggesting to go with what Pat is suggesting. @Pat could you plz tell, if I want to recommend vendors to the user from the table how they should be grouped and you mentioned *your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers* i don't know about translation could you plz tell more about the translation. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But I strongly suggest that you not use Pearson Correlation. Use the LLR similarity to compute indicator actions for each vendor. Then use a user's history of actions to score vendors. This is not only much simpler than what you are asking for, it will be more accurate. You should also measure additional actions besides ratings. On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: @Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add with a recommendation engine. At the same time most people are busy not giving you ratings, they are doing lots of other things that tell you what they are thinking and reacting to. If you don't pay attention to that additional information, you are handicapping yourself severely. On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B
Re: how to get recommendations by using user-user correlation for the given table in this mail
Hi Pat, If I am wrong plz correct me, if we take table 2 (user2) then he rated for vendor 1 - vendor 3, 1. I am going assign for each user an ID starting from 1 - N. 2. Vendors will have the ID with 601,602,603 3. Services will have the ID with 501,502,503. 4. If I translate the Vendor and Service IDs it looks like 601501,601502,601503.. 5. The input to the Mahout will be for USER ID, COMBINED ID, RATING 6. output form the Mahout will be COMBINED IDs, for the user and again I have to separate the COMBINED ID into Vendor ID and Service ID. Is this the correct flow ? Thanks and Regards, Vinayak B On Thu, Oct 2, 2014 at 12:23 AM, Pat Ferrel p...@occamsmachete.com wrote: First I agree with Ted that LLR is better. I've tried all of the similarity methods in Mahout on exactly the same dataset and got far higher cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs. If you have data in tables you need to create single interactions. These will look like: user1,vendor1,rating userN,vendorM,rating ... If you are recommending vendors (not specific services of specific vendors) you need to map your IDs into IDs that the recommender can ingest. You can’t tell which of the separate ratings will be used if the same user rated multiple services of the same vendor so you should determine which rating you want to use as input. You need to translate your IDs into Mahout IDs. Let’s say you go through all of your vendors, assign the first one a Mahout ID of integer = 0, then the next unique vendor you see will get Mahout ID = 1 and so on. You need to do this for your Items (vendors) as well. So your input to Mahout will look something like this: Formatted as Mahout User ID, Mahout Item ID, rating your input files will contain: 0,0,1 0,2000,3 0,4,5 1,3,1 1000,2000,5 … Then after you run the Mahout Item-based recommender you will get back a list of recommendations for each user. The key will be an integer equal to the Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You will need to map the Mahout IDs back into your application ids. Since you are recommending vendors the vendors are items so map all Mahout Item IDs into your vendor ids and the Mahout User IDs into your user ids. On Sep 30, 2014, at 6:55 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Thank you @Ted, but my guide is suggesting to go with what Pat is suggesting. @Pat could you plz tell, if I want to recommend vendors to the user from the table how they should be grouped and you mentioned *your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers* i don't know about translation could you plz tell more about the translation. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But I strongly suggest that you not use Pearson Correlation. Use the LLR similarity to compute indicator actions for each vendor. Then use a user's history of actions to score vendors. This is not only much simpler than what you are asking for, it will be more accurate. You should also measure additional actions besides ratings. On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: @Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add
Re: how to get recommendations by using user-user correlation for the given table in this mail
Thank you @Ted, but my guide is suggesting to go with what Pat is suggesting. @Pat could you plz tell, if I want to recommend vendors to the user from the table how they should be grouped and you mentioned *your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers* i don't know about translation could you plz tell more about the translation. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But I strongly suggest that you not use Pearson Correlation. Use the LLR similarity to compute indicator actions for each vendor. Then use a user's history of actions to score vendors. This is not only much simpler than what you are asking for, it will be more accurate. You should also measure additional actions besides ratings. On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: @Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add with a recommendation engine. At the same time most people are busy not giving you ratings, they are doing lots of other things that tell you what they are thinking and reacting to. If you don't pay attention to that additional information, you are handicapping yourself severely. On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B
how to get recommendations by using user-user correlation for the given table in this mail
Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B
Re: how to get recommendations by using user-user correlation for the given table in this mail
You have users, services, and vendors. You should decide what you want to recommend. Service? Vendor? Service of Vendor? Assuming the latter combine the services and vendors into a single ID space: vendor1-service1, vendor1-service2 … Then decide what method you want to create recs. We are generally recommending you use Hadoop “itemsimilarity or spark-itemsimilarity jobs to create an indicator matrix and use a search engine to query for recs. But you could also use the Hadoop-based recommender from Mahout. Input to the Hadoop Mapreduce jobs will take input like this: user, item 0,0 0,10 your recs will be returned using the same integer IDs so you will have to translate your “user1” and “vendor1-service1” into non-negative contiguous integers If you use spark-itemsimilarity you can use your string IDs user, item user1,vendor1-service1 user1000,vendor10-service1 ... To use a search engine have a look at this short book, which describes the process: https://www.mapr.com/practical-machine-learning On Sep 29, 2014, at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B
Re: how to get recommendations by using user-user correlation for the given table in this mail
I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add with a recommendation engine. At the same time most people are busy not giving you ratings, they are doing lots of other things that tell you what they are thinking and reacting to. If you don't pay attention to that additional information, you are handicapping yourself severely. On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B
Re: how to get recommendations by using user-user correlation for the given table in this mail
@Pat and @Ted Thank You so much for the replay. I was looking for the solution as Pat suggested, here I want to suggest the Vendors to the User which he not yet used by User taking the history of that User and compare with other user who have rated the common vendors. If we take the table in that - for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2 has rated Vendor 1, Vendor 2 and Vendor 3. - Common between User 2 and User 1 are Vendor 1 and Vendor 3. - Assume that if Pearson Correlation between them is nearly 1, hence we can Recommend the Vendor 2 to the User 1 which User 1 is not used. Can we do like this, using the Apache Mahout if Yes could you plz give some brief idea. Thanks and Regards, Vinayak B On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would recommend that you look at actions other than ratings as well. Did a user expand and read 1 review? did they read 3 reviews? Did they mark a rating as useful? Did they ask for contact information? You know your system better than I possibly could, but using other information in addition to ratings is very important for getting the highest quality predictive information. You can start with ratings, but you should push to get other kinds of information as much as possible. Ratings are often given by only a very small number of people. That severely limits how much value you can add with a recommendation engine. At the same time most people are busy not giving you ratings, they are doing lots of other things that tell you what they are thinking and reacting to. If you don't pay attention to that additional information, you are handicapping yourself severely. On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com wrote: Hi all, I have table something looks like in DB : rating table https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web Thanks and Regards, Vinayak B