Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-10-02 Thread Pat Ferrel
No. You seem to be describing combined vendor+service recommendations. So you 
will be creating input of

user ID, combined ID, rating

The way you are creating a combined ID is fine but it must still be mapped to a 
Mahout ID. The user and combined IDs must _each_ be mapped to 0-N. Think of 
Mahout IDs as row and column numbers in one big input table row # = Mahout user 
ID, column # = Mahout item ID. Mapping to and from these IDs is your task. 

user ID — 0..n
combined ID — 0..m

Then you will have input that Mahout can ingest. For example

0,0,3
1,0,2
…

The calculated recommendations will use the Mahout IDs so you must map them 
back into yours.

 
On Oct 1, 2014, at 5:58 PM, vinayakb malagatti vinayakbmalaga...@gmail.com 
wrote:

Hi Pat,

If I am wrong plz correct me, if we take table 2 (user2) then he rated for
vendor 1 - vendor 3,

  1. I am going assign for each user an ID starting from 1 - N.
  2. Vendors will have the ID with 601,602,603
  3. Services will have the ID with 501,502,503.
  4. If I translate the Vendor and Service IDs it looks like
  601501,601502,601503..
  5. The input to the Mahout will be for USER ID, COMBINED ID, RATING
  6. output form the Mahout will be COMBINED IDs, for the user and again I
  have to separate the COMBINED ID into Vendor ID and Service ID.

Is this the correct flow ?


Thanks and Regards,
Vinayak B


On Thu, Oct 2, 2014 at 12:23 AM, Pat Ferrel p...@occamsmachete.com wrote:

 First I agree with Ted that LLR is better. I've tried all of the
 similarity methods in Mahout on exactly the same dataset and got far higher
 cross-validation scores for LLR. You may still use pearson with Mahout 0.9
 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs.
 
 If you have data in tables you need to create single interactions. These
 will look like:
 
 user1,vendor1,rating
 userN,vendorM,rating
 ...
 
 If you are recommending vendors (not specific services of specific
 vendors) you need to map your IDs into IDs that the recommender can ingest.
 You can’t tell which of the separate ratings will be used if the same user
 rated multiple services of the same vendor so you should determine which
 rating you want to use as input.
 
 You need to translate your IDs into Mahout IDs. Let’s say you go through
 all of your vendors, assign the first one a Mahout ID of integer = 0, then
 the next unique vendor you see will get Mahout ID = 1 and so on. You need
 to do this for your Items (vendors) as well. So your input to Mahout will
 look something like this:
 
 Formatted as Mahout User ID, Mahout Item ID, rating your input files will
 contain:
 
 0,0,1
 0,2000,3
 0,4,5
 1,3,1
 1000,2000,5
 …
 
 Then after you run the Mahout Item-based recommender you will get back a
 list of recommendations for each user. The key will be an integer equal to
 the Mahout user ID. The value will be a list of Mahout Item IDs with
 strengths. You will need to map the Mahout IDs back into your application
 ids. Since you are recommending vendors the vendors are items so map all
 Mahout Item IDs into your vendor ids and the Mahout User IDs into your user
 ids.
 
 On Sep 30, 2014, at 6:55 PM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:
 
 Thank you  @Ted, but my guide is suggesting to go with what Pat is
 suggesting. @Pat could you plz tell, if I want to recommend vendors to the
 user from the table how they should be grouped and  you mentioned *your
 recs will be returned using the same integer IDs so you will have to
 translate your “user1” and “vendor1-service1” into non-negative contiguous
 integers* i don't know about translation could you plz tell more about the
 translation.
 
 Thanks and Regards,
 Vinayak B
 
 
 On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
 Yes.  But I strongly suggest that you not use Pearson Correlation.
 
 Use the LLR similarity to compute indicator actions for each vendor.
 Then
 use a user's history of actions to score vendors.  This is not only much
 simpler than what you are asking for, it will be more accurate.
 
 You should also measure additional actions besides ratings.
 
 
 
 On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:
 
 @Pat and @Ted Thank You so much for the replay. I was looking for the
 solution as Pat suggested, here I want to suggest the Vendors to the
 User
 which he not yet used by User taking the history of that User and
 compare
 with other user who have rated the common vendors. If we take the table
 in
 that
 
  -   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and
 User 2
  has rated Vendor 1, Vendor 2 and Vendor 3.
  -  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
  - Assume that if Pearson Correlation between them is nearly 1, hence
 we
  can Recommend the Vendor 2 to the User 1 which User 1 is not used.
 
 Can we do like this, using the Apache Mahout  if Yes could you plz give
 some brief idea.
 
 Thanks and Regards,
 

Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-10-01 Thread Pat Ferrel
First I agree with Ted that LLR is better. I've tried all of the similarity 
methods in Mahout on exactly the same dataset and got far higher 
cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 
1.0 but it is not supported in the Mahout 1.0 Spark jobs. 

If you have data in tables you need to create single interactions. These will 
look like:

user1,vendor1,rating
userN,vendorM,rating
...

If you are recommending vendors (not specific services of specific vendors) you 
need to map your IDs into IDs that the recommender can ingest. You can’t tell 
which of the separate ratings will be used if the same user rated multiple 
services of the same vendor so you should determine which rating you want to 
use as input. 

You need to translate your IDs into Mahout IDs. Let’s say you go through all of 
your vendors, assign the first one a Mahout ID of integer = 0, then the next 
unique vendor you see will get Mahout ID = 1 and so on. You need to do this for 
your Items (vendors) as well. So your input to Mahout will look something like 
this:

Formatted as Mahout User ID, Mahout Item ID, rating your input files will 
contain:

0,0,1
0,2000,3
0,4,5
1,3,1
1000,2000,5
…

Then after you run the Mahout Item-based recommender you will get back a list 
of recommendations for each user. The key will be an integer equal to the 
Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You 
will need to map the Mahout IDs back into your application ids. Since you are 
recommending vendors the vendors are items so map all Mahout Item IDs into your 
vendor ids and the Mahout User IDs into your user ids.

On Sep 30, 2014, at 6:55 PM, vinayakb malagatti vinayakbmalaga...@gmail.com 
wrote:

Thank you  @Ted, but my guide is suggesting to go with what Pat is
suggesting. @Pat could you plz tell, if I want to recommend vendors to the
user from the table how they should be grouped and  you mentioned *your
recs will be returned using the same integer IDs so you will have to
translate your “user1” and “vendor1-service1” into non-negative contiguous
integers* i don't know about translation could you plz tell more about the
translation.

Thanks and Regards,
Vinayak B


On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Yes.  But I strongly suggest that you not use Pearson Correlation.
 
 Use the LLR similarity to compute indicator actions for each vendor.  Then
 use a user's history of actions to score vendors.  This is not only much
 simpler than what you are asking for, it will be more accurate.
 
 You should also measure additional actions besides ratings.
 
 
 
 On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:
 
 @Pat and @Ted Thank You so much for the replay. I was looking for the
 solution as Pat suggested, here I want to suggest the Vendors to the User
 which he not yet used by User taking the history of that User and compare
 with other user who have rated the common vendors. If we take the table
 in
 that
 
   -   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and
 User 2
   has rated Vendor 1, Vendor 2 and Vendor 3.
   -  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
   - Assume that if Pearson Correlation between them is nearly 1, hence
 we
   can Recommend the Vendor 2 to the User 1 which User 1 is not used.
 
 Can we do like this, using the Apache Mahout  if Yes could you plz give
 some brief idea.
 
 Thanks and Regards,
 Vinayak B
 
 
 On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
 I would recommend that you look at actions other than ratings as well.
 
 Did a user expand and read 1 review?  did they read 3 reviews?
 
 Did they mark a rating as useful?
 
 Did they ask for contact information?
 
 You know your system better than I possibly could, but using other
 information in addition to ratings is very important for getting the
 highest quality predictive information.
 
 You can start with ratings, but you should push to get other kinds of
 information as much as possible.  Ratings are often given by only a
 very
 small number of people.  That severely limits how much value you can
 add
 with a recommendation engine.  At the same time most people are busy
 not
 giving you ratings, they are doing lots of other things that tell you
 what
 they are thinking and reacting to.  If you don't pay attention to that
 additional information, you are handicapping yourself severely.
 
 
 On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:
 
 Hi all,
 
 I have table something looks like in DB :
 
 
 ​​​
 rating table
 
 
 
 
 https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
 
 ​
 
 
 
 
 
 Thanks and Regards,
 Vinayak B
 
 
 
 



Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-10-01 Thread vinayakb malagatti
Hi Pat,

If I am wrong plz correct me, if we take table 2 (user2) then he rated for
vendor 1 - vendor 3,

   1. I am going assign for each user an ID starting from 1 - N.
   2. Vendors will have the ID with 601,602,603
   3. Services will have the ID with 501,502,503.
   4. If I translate the Vendor and Service IDs it looks like
   601501,601502,601503..
   5. The input to the Mahout will be for USER ID, COMBINED ID, RATING
   6. output form the Mahout will be COMBINED IDs, for the user and again I
   have to separate the COMBINED ID into Vendor ID and Service ID.

Is this the correct flow ?


Thanks and Regards,
Vinayak B


On Thu, Oct 2, 2014 at 12:23 AM, Pat Ferrel p...@occamsmachete.com wrote:

 First I agree with Ted that LLR is better. I've tried all of the
 similarity methods in Mahout on exactly the same dataset and got far higher
 cross-validation scores for LLR. You may still use pearson with Mahout 0.9
 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs.

 If you have data in tables you need to create single interactions. These
 will look like:

 user1,vendor1,rating
 userN,vendorM,rating
 ...

 If you are recommending vendors (not specific services of specific
 vendors) you need to map your IDs into IDs that the recommender can ingest.
 You can’t tell which of the separate ratings will be used if the same user
 rated multiple services of the same vendor so you should determine which
 rating you want to use as input.

 You need to translate your IDs into Mahout IDs. Let’s say you go through
 all of your vendors, assign the first one a Mahout ID of integer = 0, then
 the next unique vendor you see will get Mahout ID = 1 and so on. You need
 to do this for your Items (vendors) as well. So your input to Mahout will
 look something like this:

 Formatted as Mahout User ID, Mahout Item ID, rating your input files will
 contain:

 0,0,1
 0,2000,3
 0,4,5
 1,3,1
 1000,2000,5
 …

 Then after you run the Mahout Item-based recommender you will get back a
 list of recommendations for each user. The key will be an integer equal to
 the Mahout user ID. The value will be a list of Mahout Item IDs with
 strengths. You will need to map the Mahout IDs back into your application
 ids. Since you are recommending vendors the vendors are items so map all
 Mahout Item IDs into your vendor ids and the Mahout User IDs into your user
 ids.

 On Sep 30, 2014, at 6:55 PM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:

 Thank you  @Ted, but my guide is suggesting to go with what Pat is
 suggesting. @Pat could you plz tell, if I want to recommend vendors to the
 user from the table how they should be grouped and  you mentioned *your
 recs will be returned using the same integer IDs so you will have to
 translate your “user1” and “vendor1-service1” into non-negative contiguous
 integers* i don't know about translation could you plz tell more about the
 translation.

 Thanks and Regards,
 Vinayak B


 On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  Yes.  But I strongly suggest that you not use Pearson Correlation.
 
  Use the LLR similarity to compute indicator actions for each vendor.
 Then
  use a user's history of actions to score vendors.  This is not only much
  simpler than what you are asking for, it will be more accurate.
 
  You should also measure additional actions besides ratings.
 
 
 
  On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti 
  vinayakbmalaga...@gmail.com wrote:
 
  @Pat and @Ted Thank You so much for the replay. I was looking for the
  solution as Pat suggested, here I want to suggest the Vendors to the
 User
  which he not yet used by User taking the history of that User and
 compare
  with other user who have rated the common vendors. If we take the table
  in
  that
 
-   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and
  User 2
has rated Vendor 1, Vendor 2 and Vendor 3.
-  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
- Assume that if Pearson Correlation between them is nearly 1, hence
  we
can Recommend the Vendor 2 to the User 1 which User 1 is not used.
 
  Can we do like this, using the Apache Mahout  if Yes could you plz give
  some brief idea.
 
  Thanks and Regards,
  Vinayak B
 
 
  On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
  I would recommend that you look at actions other than ratings as well.
 
  Did a user expand and read 1 review?  did they read 3 reviews?
 
  Did they mark a rating as useful?
 
  Did they ask for contact information?
 
  You know your system better than I possibly could, but using other
  information in addition to ratings is very important for getting the
  highest quality predictive information.
 
  You can start with ratings, but you should push to get other kinds of
  information as much as possible.  Ratings are often given by only a
  very
  small number of people.  That severely limits how much value you can
  add

Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-09-30 Thread vinayakb malagatti
Thank you  @Ted, but my guide is suggesting to go with what Pat is
suggesting. @Pat could you plz tell, if I want to recommend vendors to the
user from the table how they should be grouped and  you mentioned *your
recs will be returned using the same integer IDs so you will have to
translate your “user1” and “vendor1-service1” into non-negative contiguous
integers* i don't know about translation could you plz tell more about the
translation.

Thanks and Regards,
Vinayak B


On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Yes.  But I strongly suggest that you not use Pearson Correlation.

 Use the LLR similarity to compute indicator actions for each vendor.  Then
 use a user's history of actions to score vendors.  This is not only much
 simpler than what you are asking for, it will be more accurate.

 You should also measure additional actions besides ratings.



 On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:

  @Pat and @Ted Thank You so much for the replay. I was looking for the
  solution as Pat suggested, here I want to suggest the Vendors to the User
  which he not yet used by User taking the history of that User and compare
  with other user who have rated the common vendors. If we take the table
 in
  that
 
 -   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and
 User 2
 has rated Vendor 1, Vendor 2 and Vendor 3.
 -  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
 - Assume that if Pearson Correlation between them is nearly 1, hence
 we
 can Recommend the Vendor 2 to the User 1 which User 1 is not used.
 
  Can we do like this, using the Apache Mahout  if Yes could you plz give
  some brief idea.
 
  Thanks and Regards,
  Vinayak B
 
 
  On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   I would recommend that you look at actions other than ratings as well.
  
   Did a user expand and read 1 review?  did they read 3 reviews?
  
   Did they mark a rating as useful?
  
   Did they ask for contact information?
  
   You know your system better than I possibly could, but using other
   information in addition to ratings is very important for getting the
   highest quality predictive information.
  
   You can start with ratings, but you should push to get other kinds of
   information as much as possible.  Ratings are often given by only a
 very
   small number of people.  That severely limits how much value you can
 add
   with a recommendation engine.  At the same time most people are busy
 not
   giving you ratings, they are doing lots of other things that tell you
  what
   they are thinking and reacting to.  If you don't pay attention to that
   additional information, you are handicapping yourself severely.
  
  
   On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti 
   vinayakbmalaga...@gmail.com wrote:
  
Hi all,
   
I have table something looks like in DB :
   
   
​​​
 rating table

   
  
 
 https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web

​
   
   
   
   
   
Thanks and Regards,
Vinayak B
   
  
 



how to get recommendations by using user-user correlation for the given table in this mail

2014-09-29 Thread vinayakb malagatti
Hi all,

I have table something looks like in DB :


​​​
 rating table
https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
​





Thanks and Regards,
Vinayak B


Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-09-29 Thread Pat Ferrel
You have users, services, and vendors. You should decide what you want to 
recommend. Service? Vendor? Service of Vendor?

Assuming the latter combine the services and vendors into a single ID space: 
vendor1-service1, vendor1-service2 …

Then decide what method you want to create recs. We are generally recommending 
you use Hadoop “itemsimilarity or spark-itemsimilarity jobs to create an 
indicator matrix and use a search engine to query for recs. But you could also 
use the Hadoop-based recommender from Mahout.

Input to the Hadoop Mapreduce jobs will take input like this:
user, item
0,0
0,10

your recs will be returned using the same integer IDs so you will have to 
translate your “user1” and “vendor1-service1” into non-negative contiguous 
integers

If you use spark-itemsimilarity you can use your string IDs
user, item
user1,vendor1-service1
user1000,vendor10-service1
...

To use a search engine have a look at this short book, which describes the 
process: https://www.mapr.com/practical-machine-learning

On Sep 29, 2014, at 9:53 AM, vinayakb malagatti vinayakbmalaga...@gmail.com 
wrote:

Hi all,

I have table something looks like in DB :


​​​
rating table
https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
​





Thanks and Regards,
Vinayak B



Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-09-29 Thread Ted Dunning
I would recommend that you look at actions other than ratings as well.

Did a user expand and read 1 review?  did they read 3 reviews?

Did they mark a rating as useful?

Did they ask for contact information?

You know your system better than I possibly could, but using other
information in addition to ratings is very important for getting the
highest quality predictive information.

You can start with ratings, but you should push to get other kinds of
information as much as possible.  Ratings are often given by only a very
small number of people.  That severely limits how much value you can add
with a recommendation engine.  At the same time most people are busy not
giving you ratings, they are doing lots of other things that tell you what
they are thinking and reacting to.  If you don't pay attention to that
additional information, you are handicapping yourself severely.


On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti 
vinayakbmalaga...@gmail.com wrote:

 Hi all,

 I have table something looks like in DB :


 ​​​
  rating table
 
 https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
 
 ​





 Thanks and Regards,
 Vinayak B



Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-09-29 Thread vinayakb malagatti
@Pat and @Ted Thank You so much for the replay. I was looking for the
solution as Pat suggested, here I want to suggest the Vendors to the User
which he not yet used by User taking the history of that User and compare
with other user who have rated the common vendors. If we take the table in
that

   -   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and User 2
   has rated Vendor 1, Vendor 2 and Vendor 3.
   -  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
   - Assume that if Pearson Correlation between them is nearly 1, hence we
   can Recommend the Vendor 2 to the User 1 which User 1 is not used.

Can we do like this, using the Apache Mahout  if Yes could you plz give
some brief idea.

Thanks and Regards,
Vinayak B


On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 I would recommend that you look at actions other than ratings as well.

 Did a user expand and read 1 review?  did they read 3 reviews?

 Did they mark a rating as useful?

 Did they ask for contact information?

 You know your system better than I possibly could, but using other
 information in addition to ratings is very important for getting the
 highest quality predictive information.

 You can start with ratings, but you should push to get other kinds of
 information as much as possible.  Ratings are often given by only a very
 small number of people.  That severely limits how much value you can add
 with a recommendation engine.  At the same time most people are busy not
 giving you ratings, they are doing lots of other things that tell you what
 they are thinking and reacting to.  If you don't pay attention to that
 additional information, you are handicapping yourself severely.


 On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti 
 vinayakbmalaga...@gmail.com wrote:

  Hi all,
 
  I have table something looks like in DB :
 
 
  ​​​
   rating table
  
 
 https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
  
  ​
 
 
 
 
 
  Thanks and Regards,
  Vinayak B