Re: Thoughts and Questions on a e-Commerce Recommender System and Expired Items

2013-11-06 Thread Sigbjørn Dybdahl
Hi,

Could this be due to a small number of items viewed/liked/purchased per
user?

Correct me if I'm wrong, but this would make the total recommendation space
sparse, thus making it hard to find good recommendations (ie
recommendations which are relevent and not obsolete). If so, it might be
worth considering simpler approaches than CF.


Sigbjørn


On 22 October 2013 12:43, Cassio Melo melo.cas...@gmail.com wrote:

 Hi all,

 I have a product recommendation use case for an e-commerce site and I've
 been playing around mahout's CF capabilities lately. I reached a point
 where I should ask a feedback from the community on my approach. I'm
 struggling to get ANY recommendation.

 Here's what I've done so far:

 1) Collected data in this format:

 USER_ID,PRODUCT_ID,VIEW,LIKE,FAVORITE,PURCHASE
 0001,,1,0,1,1

 2) Built preferences for each user based on a computed score [0,1] for the
 attributes (view,like,favorite,purchase)

 3) Implemented a custom ItemSimilarity class to boost products in the same
 type, from the same manufacturer, with similar prices, etc

 4) Implemented a custom IDRescorer to filter out products that are no
 longer available.

 5) Implemented a Recommender subclass that simply instantiates a
 GenericItemRecommender with my custom similarity class: recommender = new
 GenericItemBasedRecommender(myModel, new ProductSimilarity());

 It happens that bmost/b of product/preferences data is historical and
 most products are no longer available. Because of this I get zero
 recommendations when the IDRescorer's filter is activated (yes I checked
 the isFiltered method, it is returning true for expired products and false
 otherwise as expected) event though there are lots of valid products.

 Here's the overridden method in the IDRescorer's class:

 public boolean isFiltered(long id) {

  Product a = PreferencesDataModel.lookupProduct(id);

 return  ! a.isActive(); // filter expired product

 }

 I see it as beneficial to feed the engine with historical data on expired
 products and filtering them for recommendation, but getting zero
 recommendations made me rethink this approach (I also tried different
 similarity metrics including UserSim). What do you guys think?




-- 
Sigbjørn DYBDAHL | 55 | fifty-five.com http://www.fifty-five.com/ | 4,
place de l'Opéra, 75002 Paris | 01 76 21 91 32


Re: Thoughts and Questions on a e-Commerce Recommender System and Expired Items

2013-11-06 Thread Gokhan Capan
Cassio,

I would implement a CandidateItemsStrategy that returns products that are
available now. A neighborhood based recommender would iterate over those
products, and rank them based on the similarity measure you provide.

If the DataModel of your recommender does not contain most of your
available items, you may want to overcome this cold-start challenge by
making your similarity take content features into account.

Hope that helps,
Best



Gokhan


On Wed, Nov 6, 2013 at 1:40 PM, Sigbjørn Dybdahl sigbj...@fifty-five.comwrote:

 Hi,

 Could this be due to a small number of items viewed/liked/purchased per
 user?

 Correct me if I'm wrong, but this would make the total recommendation space
 sparse, thus making it hard to find good recommendations (ie
 recommendations which are relevent and not obsolete). If so, it might be
 worth considering simpler approaches than CF.


 Sigbjørn


 On 22 October 2013 12:43, Cassio Melo melo.cas...@gmail.com wrote:

  Hi all,
 
  I have a product recommendation use case for an e-commerce site and I've
  been playing around mahout's CF capabilities lately. I reached a point
  where I should ask a feedback from the community on my approach. I'm
  struggling to get ANY recommendation.
 
  Here's what I've done so far:
 
  1) Collected data in this format:
 
  USER_ID,PRODUCT_ID,VIEW,LIKE,FAVORITE,PURCHASE
  0001,,1,0,1,1
 
  2) Built preferences for each user based on a computed score [0,1] for
 the
  attributes (view,like,favorite,purchase)
 
  3) Implemented a custom ItemSimilarity class to boost products in the
 same
  type, from the same manufacturer, with similar prices, etc
 
  4) Implemented a custom IDRescorer to filter out products that are no
  longer available.
 
  5) Implemented a Recommender subclass that simply instantiates a
  GenericItemRecommender with my custom similarity class: recommender = new
  GenericItemBasedRecommender(myModel, new ProductSimilarity());
 
  It happens that bmost/b of product/preferences data is historical and
  most products are no longer available. Because of this I get zero
  recommendations when the IDRescorer's filter is activated (yes I checked
  the isFiltered method, it is returning true for expired products and
 false
  otherwise as expected) event though there are lots of valid products.
 
  Here's the overridden method in the IDRescorer's class:
 
  public boolean isFiltered(long id) {
 
   Product a = PreferencesDataModel.lookupProduct(id);
 
  return  ! a.isActive(); // filter expired product
 
  }
 
  I see it as beneficial to feed the engine with historical data on expired
  products and filtering them for recommendation, but getting zero
  recommendations made me rethink this approach (I also tried different
  similarity metrics including UserSim). What do you guys think?
 



 --
 Sigbjørn DYBDAHL | 55 | fifty-five.com http://www.fifty-five.com/ | 4,
 place de l'Opéra, 75002 Paris | 01 76 21 91 32



Re: Thoughts and Questions on a e-Commerce Recommender System and Expired Items

2013-11-06 Thread Cassio Melo
Good idea Gokhan, thanks!

@ Sigbjørn: Thanks for the feedback. In fact we have plenty of [implicit]
preference data for each user, e.g. product view. What I found out is that
data from a certain point in time were very noisy and inconsistent, when I
started fetching from that point on I got much better results.


On Wed, Nov 6, 2013 at 2:24 PM, Gokhan Capan gkhn...@gmail.com wrote:

 Cassio,

 I would implement a CandidateItemsStrategy that returns products that are
 available now. A neighborhood based recommender would iterate over those
 products, and rank them based on the similarity measure you provide.

 If the DataModel of your recommender does not contain most of your
 available items, you may want to overcome this cold-start challenge by
 making your similarity take content features into account.

 Hope that helps,
 Best



 Gokhan


 On Wed, Nov 6, 2013 at 1:40 PM, Sigbjørn Dybdahl sigbj...@fifty-five.com
 wrote:

  Hi,
 
  Could this be due to a small number of items viewed/liked/purchased per
  user?
 
  Correct me if I'm wrong, but this would make the total recommendation
 space
  sparse, thus making it hard to find good recommendations (ie
  recommendations which are relevent and not obsolete). If so, it might be
  worth considering simpler approaches than CF.
 
 
  Sigbjørn
 
 
  On 22 October 2013 12:43, Cassio Melo melo.cas...@gmail.com wrote:
 
   Hi all,
  
   I have a product recommendation use case for an e-commerce site and
 I've
   been playing around mahout's CF capabilities lately. I reached a point
   where I should ask a feedback from the community on my approach. I'm
   struggling to get ANY recommendation.
  
   Here's what I've done so far:
  
   1) Collected data in this format:
  
   USER_ID,PRODUCT_ID,VIEW,LIKE,FAVORITE,PURCHASE
   0001,,1,0,1,1
  
   2) Built preferences for each user based on a computed score [0,1] for
  the
   attributes (view,like,favorite,purchase)
  
   3) Implemented a custom ItemSimilarity class to boost products in the
  same
   type, from the same manufacturer, with similar prices, etc
  
   4) Implemented a custom IDRescorer to filter out products that are no
   longer available.
  
   5) Implemented a Recommender subclass that simply instantiates a
   GenericItemRecommender with my custom similarity class: recommender =
 new
   GenericItemBasedRecommender(myModel, new ProductSimilarity());
  
   It happens that bmost/b of product/preferences data is historical
 and
   most products are no longer available. Because of this I get zero
   recommendations when the IDRescorer's filter is activated (yes I
 checked
   the isFiltered method, it is returning true for expired products and
  false
   otherwise as expected) event though there are lots of valid products.
  
   Here's the overridden method in the IDRescorer's class:
  
   public boolean isFiltered(long id) {
  
Product a = PreferencesDataModel.lookupProduct(id);
  
   return  ! a.isActive(); // filter expired product
  
   }
  
   I see it as beneficial to feed the engine with historical data on
 expired
   products and filtering them for recommendation, but getting zero
   recommendations made me rethink this approach (I also tried different
   similarity metrics including UserSim). What do you guys think?
  
 
 
 
  --
  Sigbjørn DYBDAHL | 55 | fifty-five.com http://www.fifty-five.com/ | 4,
  place de l'Opéra, 75002 Paris | 01 76 21 91 32
 



Re: Thoughts and Questions on a e-Commerce Recommender System and Expired Items

2013-11-06 Thread Pat Ferrel
If you have a lot of old historical data for products that no longer exist you 
may be getting recommendations from that set. Using the old data is, in 
principal fine and should make recs better. However you may be running into a 
limit for the default number of recs returned, which is something like 100. If 
all 100 are from old out of stock products you’ll filter all results out. Have 
you checked before the filter? You could try increasing the number returned and 
see if more recs get through the filter.

From experience with a retail product recommender, we found the item-item 
similarities often produced better results than individual user preferences. 
You can use the api to get recommendations for an item rather than a user if 
you know an item context. So when viewing an item detail page, you can display 
similar items without using user preferences.

As to decaying user preferences, I’d not do that at first or at least be aware 
of the ramifications. Decay based on time then using Mahout has the effect of 
removing training data that may be very useful. Unfortunately Mahout uses the 
preference collection as both training data and query data. It would be nice if 
the query had some number of the most recent user actions and the training data 
contained many more actions from all users. You can do this with the 
Solr/Mahout-recommender since training and query data can be separate. 

On Nov 6, 2013, at 9:00 AM, Cassio Melo melo.cas...@gmail.com wrote:

Good idea Gokhan, thanks!

@ Sigbjørn: Thanks for the feedback. In fact we have plenty of [implicit]
preference data for each user, e.g. product view. What I found out is that
data from a certain point in time were very noisy and inconsistent, when I
started fetching from that point on I got much better results.


On Wed, Nov 6, 2013 at 2:24 PM, Gokhan Capan gkhn...@gmail.com wrote:

 Cassio,
 
 I would implement a CandidateItemsStrategy that returns products that are
 available now. A neighborhood based recommender would iterate over those
 products, and rank them based on the similarity measure you provide.
 
 If the DataModel of your recommender does not contain most of your
 available items, you may want to overcome this cold-start challenge by
 making your similarity take content features into account.
 
 Hope that helps,
 Best
 
 
 
 Gokhan
 
 
 On Wed, Nov 6, 2013 at 1:40 PM, Sigbjørn Dybdahl sigbj...@fifty-five.com
 wrote:
 
 Hi,
 
 Could this be due to a small number of items viewed/liked/purchased per
 user?
 
 Correct me if I'm wrong, but this would make the total recommendation
 space
 sparse, thus making it hard to find good recommendations (ie
 recommendations which are relevent and not obsolete). If so, it might be
 worth considering simpler approaches than CF.
 
 
 Sigbjørn
 
 
 On 22 October 2013 12:43, Cassio Melo melo.cas...@gmail.com wrote:
 
 Hi all,
 
 I have a product recommendation use case for an e-commerce site and
 I've
 been playing around mahout's CF capabilities lately. I reached a point
 where I should ask a feedback from the community on my approach. I'm
 struggling to get ANY recommendation.
 
 Here's what I've done so far:
 
 1) Collected data in this format:
 
 USER_ID,PRODUCT_ID,VIEW,LIKE,FAVORITE,PURCHASE
 0001,,1,0,1,1
 
 2) Built preferences for each user based on a computed score [0,1] for
 the
 attributes (view,like,favorite,purchase)
 
 3) Implemented a custom ItemSimilarity class to boost products in the
 same
 type, from the same manufacturer, with similar prices, etc
 
 4) Implemented a custom IDRescorer to filter out products that are no
 longer available.
 
 5) Implemented a Recommender subclass that simply instantiates a
 GenericItemRecommender with my custom similarity class: recommender =
 new
 GenericItemBasedRecommender(myModel, new ProductSimilarity());
 
 It happens that bmost/b of product/preferences data is historical
 and
 most products are no longer available. Because of this I get zero
 recommendations when the IDRescorer's filter is activated (yes I
 checked
 the isFiltered method, it is returning true for expired products and
 false
 otherwise as expected) event though there are lots of valid products.
 
 Here's the overridden method in the IDRescorer's class:
 
 public boolean isFiltered(long id) {
 
 Product a = PreferencesDataModel.lookupProduct(id);
 
return  ! a.isActive(); // filter expired product
 
 }
 
 I see it as beneficial to feed the engine with historical data on
 expired
 products and filtering them for recommendation, but getting zero
 recommendations made me rethink this approach (I also tried different
 similarity metrics including UserSim). What do you guys think?
 
 
 
 
 --
 Sigbjørn DYBDAHL | 55 | fifty-five.com http://www.fifty-five.com/ | 4,
 place de l'Opéra, 75002 Paris | 01 76 21 91 32