Re: Is Mahout the right tool to recommend cross sales?
This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that should determine the recommendation. I have no preference values so merely boolean preferences will be used. If Mahout can perform these calculations then how must I present the data? Will I need to shape the data in some way to feed into Mahout (currently versed in using Hadoop via Aws Emr using Java) Thanks for the advice in advance, Billy
Re: Is Mahout the right tool to recommend cross sales?
Or you may want to look at recording purchases by user ID. Then use the standard recommender to train on (userID, itemsID, boolean). Then query the trained recommender thus: recommender.mostSimilarItems(long itemID, int howMany) This does what you want but uses more data than just what items were purchased together, sound like a shopping-cart recommender. On Apr 11, 2013, at 10:28 AM, Sean Owen sro...@gmail.com wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that should determine the recommendation. I have no preference values so merely boolean preferences will be used. If Mahout can perform these calculations then how must I present the data? Will I need to shape the data in some way to feed into Mahout (currently versed in using Hadoop via Aws Emr using Java) Thanks for the advice in advance, Billy
Re: Is Mahout the right tool to recommend cross sales?
Use ItemSimilarityJob instead of RowSimilarityJob, its the easy-to-use wrapper around that :) On 11.04.2013 19:28, Sean Owen wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that should determine the recommendation. I have no preference values so merely boolean preferences will be used. If Mahout can perform these calculations then how must I present the data? Will I need to shape the data in some way to feed into Mahout (currently versed in using Hadoop via Aws Emr using Java) Thanks for the advice in advance, Billy
Re: Is Mahout the right tool to recommend cross sales?
You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these need to be based on the relationship between items ordered with other items in the 'same order' . E.g. If item #1 has been order with item #4 [ 22 ] times and item #1 has been order with item #9 [ 57 ] times then if I added item #1 to my order these would both be recommended but item #9 would be recommended above item #4 purely based on the fact that the relationship between item #1 and item #9 is greater than the relationship with item #4. What I don't want is; if a user ordered items #A, #B, #C separately 'at some point in their order history' then recommen d #A and #C to other users who order #B ... I still don't want this if the items are similar and/or the users similar. Cheers Billy On 11 Apr 2013 18:28, Sean Owen sro...@gmail.com wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that should determine the recommendation. I have no preference values so merely boolean preferences will be used. If Mahout can perform these calculations then how must I present the data? Will I need to shape the data in some way to feed into Mahout (currently versed in using Hadoop via Aws Emr using Java) Thanks for the advice in advance, Billy
Re: Is Mahout the right tool to recommend cross sales?
Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote: You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these need to be based on the relationship between items ordered with other items in the 'same order' . E.g. If item #1 has been order with item #4 [ 22 ] times and item #1 has been order with item #9 [ 57 ] times then if I added item #1 to my order these would both be recommended but item #9 would be recommended above item #4 purely based on the fact that the relationship between item #1 and item #9 is greater than the relationship with item #4. What I don't want is; if a user ordered items #A, #B, #C separately 'at some point in their order history' then recommen d #A and #C to other users who order #B ... I still don't want this if the items are similar and/or the users similar. Cheers Billy On 11 Apr 2013 18:28, Sean Owen sro...@gmail.com wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that should determine the recommendation. I have no preference values so merely boolean preferences will be used. If Mahout can perform these calculations then how must I present the data? Will I need to shape the data in some way to feed into Mahout (currently versed in using Hadoop via Aws Emr using Java) Thanks for the advice in advance, Billy
Re: Is Mahout the right tool to recommend cross sales?
As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask for recommendations for user 1 then this works but if I ask for recommendations for user 6 (a new user yet to be added to the data model) then I get no recommendations ... so if I substitute users for orders then again I will get no recommendations ... which I sort of understand so do I need to inject my 'new' active order; along with its attached item/s into the data model first and then ask for the recommendations for the order by offering up the new orderId? or is there a way of merely offering up an 'item' and then getting recommendations based merely on the item using the data already stored and the relationships with my item? My assumptions: #1 I am assuming the data model is a static island of data that has been processed (flattened) overnight (most probably by an Hadoop process) due to the size of this data ... rather than a living document that is updated as soon as new data is available. #2 I'm also assuming that instead of reading in the data model and providing recommendations 'on the fly' I will have to run thru every item in my catalogue and find out the top 5 recommended items that are ordered with each item (most probably via a Hadoop process) and then store this output in dynamoDb or luncene for quick access. Sorry for all the questions but it such an interesting subject. On 11 April 2013 22:04, Ted Dunning ted.dunn...@gmail.com wrote: Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote: You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these need to be based on the relationship between items ordered with other items in the 'same order' . E.g. If item #1 has been order with item #4 [ 22 ] times and item #1 has been order with item #9 [ 57 ] times then if I added item #1 to my order these would both be recommended but item #9 would be recommended above item #4 purely based on the fact that the relationship between item #1 and item #9 is greater than the relationship with item #4. What I don't want is; if a user ordered items #A, #B, #C separately 'at some point in their order history' then recommen d #A and #C to other users who order #B ... I still don't want this if the items are similar and/or the users similar. Cheers Billy On 11 Apr 2013 18:28, Sean Owen sro...@gmail.com wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that should determine the recommendation. I have no preference values so merely boolean preferences will be used. If Mahout can perform these calculations then how must I present the data? Will I need to shape the data in some way to feed into Mahout (currently versed in using Hadoop via Aws Emr using Java) Thanks for the advice in advance, Billy
Re: Is Mahout the right tool to recommend cross sales?
You can actually create a user #6 for your new order. Or you can use the anonymous user function of the library, although it's hacky. We may be mixing up terms here. DataModel is a class that has nothing to do with Hadoop. Hadoop in turn has no part in real-time anything, like recommending to a brand-new user. However it could build an offline model of item-item similarities and you could do something like a most-similar-items computation for a given new basket of goods. That is effectively what the anonymous user function is doing anyway. You can precompute all recommendations for all items but that's a lot of work! It's easy to get away with it with a thousand items, but with a million this may be infeasibly slow. On Thu, Apr 11, 2013 at 10:38 PM, Billy b...@ntlworld.com wrote: As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask for recommendations for user 1 then this works but if I ask for recommendations for user 6 (a new user yet to be added to the data model) then I get no recommendations ... so if I substitute users for orders then again I will get no recommendations ... which I sort of understand so do I need to inject my 'new' active order; along with its attached item/s into the data model first and then ask for the recommendations for the order by offering up the new orderId? or is there a way of merely offering up an 'item' and then getting recommendations based merely on the item using the data already stored and the relationships with my item? My assumptions: #1 I am assuming the data model is a static island of data that has been processed (flattened) overnight (most probably by an Hadoop process) due to the size of this data ... rather than a living document that is updated as soon as new data is available. #2 I'm also assuming that instead of reading in the data model and providing recommendations 'on the fly' I will have to run thru every item in my catalogue and find out the top 5 recommended items that are ordered with each item (most probably via a Hadoop process) and then store this output in dynamoDb or luncene for quick access. Sorry for all the questions but it such an interesting subject. On 11 April 2013 22:04, Ted Dunning ted.dunn...@gmail.com wrote: Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote: You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these need to be based on the relationship between items ordered with other items in the 'same order' . E.g. If item #1 has been order with item #4 [ 22 ] times and item #1 has been order with item #9 [ 57 ] times then if I added item #1 to my order these would both be recommended but item #9 would be recommended above item #4 purely based on the fact that the relationship between item #1 and item #9 is greater than the relationship with item #4. What I don't want is; if a user ordered items #A, #B, #C separately 'at some point in their order history' then recommen d #A and #C to other users who order #B ... I still don't want this if the items are similar and/or the users similar. Cheers Billy On 11 Apr 2013 18:28, Sean Owen sro...@gmail.com wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, columns the users). On Thu, Apr 11, 2013 at 6:11 PM, Billy b...@ntlworld.com wrote: I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' item recommendations based on a chosen item. E.g. if I've added item #1 to my order then based on all the other items; in all the other orders for this site, what are the likely items that I may also want add to my order based; on the item to item relationship in the history of orders of this site? Most probably using the most popular relationship between the item I have chosen and all the items in all the other orders. My data is not 'user' specific; and I don't think it should be, but more like order specific as its the pattern of items in each order that
Re: Is Mahout the right tool to recommend cross sales?
Do you not have a user ID? No matter (though if you do I'd use it) you can use the item ID as a surrogate for a user ID in the recommender. And there will be no filtering if you ask for recommender.mostSimilarItems(long itemID, int howMany), which has no user ID in the call and so will not filter. Since the recommender doesn't know you are using item IDs for user IDs this should work fine. This allows you to use the in-memory version of the recommender as it is described in MiA. The Row and ItemSimilarityJobs are mapreduce and will produce results for all items in a batch. This is fine and will produce much the same results but you will have to code up the query part yourself as a runtime/live/service component. Using the in-memory recommender gives you a query interface to call whenever you are showing a page to the user. Using the user ID will allow you to make and blend in user based recommendations, which are calculated based on individual user history. They may not be your primary recommendations, but when you dont have enough item similarities, you can fall back or blend in user recommendations. On Apr 11, 2013, at 2:42 PM, Sean Owen sro...@gmail.com wrote: You can actually create a user #6 for your new order. Or you can use the anonymous user function of the library, although it's hacky. We may be mixing up terms here. DataModel is a class that has nothing to do with Hadoop. Hadoop in turn has no part in real-time anything, like recommending to a brand-new user. However it could build an offline model of item-item similarities and you could do something like a most-similar-items computation for a given new basket of goods. That is effectively what the anonymous user function is doing anyway. You can precompute all recommendations for all items but that's a lot of work! It's easy to get away with it with a thousand items, but with a million this may be infeasibly slow. On Thu, Apr 11, 2013 at 10:38 PM, Billy b...@ntlworld.com wrote: As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask for recommendations for user 1 then this works but if I ask for recommendations for user 6 (a new user yet to be added to the data model) then I get no recommendations ... so if I substitute users for orders then again I will get no recommendations ... which I sort of understand so do I need to inject my 'new' active order; along with its attached item/s into the data model first and then ask for the recommendations for the order by offering up the new orderId? or is there a way of merely offering up an 'item' and then getting recommendations based merely on the item using the data already stored and the relationships with my item? My assumptions: #1 I am assuming the data model is a static island of data that has been processed (flattened) overnight (most probably by an Hadoop process) due to the size of this data ... rather than a living document that is updated as soon as new data is available. #2 I'm also assuming that instead of reading in the data model and providing recommendations 'on the fly' I will have to run thru every item in my catalogue and find out the top 5 recommended items that are ordered with each item (most probably via a Hadoop process) and then store this output in dynamoDb or luncene for quick access. Sorry for all the questions but it such an interesting subject. On 11 April 2013 22:04, Ted Dunning ted.dunn...@gmail.com wrote: Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote: You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these need to be based on the relationship between items ordered with other items in the 'same order' . E.g. If item #1 has been order with item #4 [ 22 ] times and item #1 has been order with item #9 [ 57 ] times then if I added item #1 to my order these would both be recommended but item #9 would be recommended above item #4 purely based on the fact that the relationship between item #1 and item #9 is greater than the relationship with item #4. What I don't want is; if a user ordered items #A, #B, #C separately 'at some point in their order history' then recommen d #A and #C to other users who order #B ... I still don't want this if the items are similar and/or the users similar. Cheers Billy On 11 Apr 2013 18:28, Sean Owen sro...@gmail.com wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute
Re: Is Mahout the right tool to recommend cross sales?
You can also use the new MultithreadedBatchItemSimilarities class to efficiently precompute item similarities on a single machine without having to go to MapReduce. On 12.04.2013 00:54, Pat Ferrel wrote: Do you not have a user ID? No matter (though if you do I'd use it) you can use the item ID as a surrogate for a user ID in the recommender. And there will be no filtering if you ask for recommender.mostSimilarItems(long itemID, int howMany), which has no user ID in the call and so will not filter. Since the recommender doesn't know you are using item IDs for user IDs this should work fine. This allows you to use the in-memory version of the recommender as it is described in MiA. The Row and ItemSimilarityJobs are mapreduce and will produce results for all items in a batch. This is fine and will produce much the same results but you will have to code up the query part yourself as a runtime/live/service component. Using the in-memory recommender gives you a query interface to call whenever you are showing a page to the user. Using the user ID will allow you to make and blend in user based recommendations, which are calculated based on individual user history. They may not be your primary recommendations, but when you dont have enough item similarities, you can fall back or blend in user recommendations. On Apr 11, 2013, at 2:42 PM, Sean Owen sro...@gmail.com wrote: You can actually create a user #6 for your new order. Or you can use the anonymous user function of the library, although it's hacky. We may be mixing up terms here. DataModel is a class that has nothing to do with Hadoop. Hadoop in turn has no part in real-time anything, like recommending to a brand-new user. However it could build an offline model of item-item similarities and you could do something like a most-similar-items computation for a given new basket of goods. That is effectively what the anonymous user function is doing anyway. You can precompute all recommendations for all items but that's a lot of work! It's easy to get away with it with a thousand items, but with a million this may be infeasibly slow. On Thu, Apr 11, 2013 at 10:38 PM, Billy b...@ntlworld.com wrote: As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask for recommendations for user 1 then this works but if I ask for recommendations for user 6 (a new user yet to be added to the data model) then I get no recommendations ... so if I substitute users for orders then again I will get no recommendations ... which I sort of understand so do I need to inject my 'new' active order; along with its attached item/s into the data model first and then ask for the recommendations for the order by offering up the new orderId? or is there a way of merely offering up an 'item' and then getting recommendations based merely on the item using the data already stored and the relationships with my item? My assumptions: #1 I am assuming the data model is a static island of data that has been processed (flattened) overnight (most probably by an Hadoop process) due to the size of this data ... rather than a living document that is updated as soon as new data is available. #2 I'm also assuming that instead of reading in the data model and providing recommendations 'on the fly' I will have to run thru every item in my catalogue and find out the top 5 recommended items that are ordered with each item (most probably via a Hadoop process) and then store this output in dynamoDb or luncene for quick access. Sorry for all the questions but it such an interesting subject. On 11 April 2013 22:04, Ted Dunning ted.dunn...@gmail.com wrote: Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote: You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these need to be based on the relationship between items ordered with other items in the 'same order' . E.g. If item #1 has been order with item #4 [ 22 ] times and item #1 has been order with item #9 [ 57 ] times then if I added item #1 to my order these would both be recommended but item #9 would be recommended above item #4 purely based on the fact that the relationship between item #1 and item #9 is greater than the relationship with item #4. What I don't want is; if a user ordered items #A, #B, #C separately 'at some point in their order history' then recommen d #A and #C to other users who order #B ... I still don't want this if the items are similar and/or the users