intro-cooccurrence-spark.mdtext

pat Thu, 02 Oct 2014 14:25:16 -0700

Author: pat
Date: Thu Oct  2 21:23:39 2014
New Revision: 1629072

URL: http://svn.apache.org/r1629072
Log:
CMS commit to mahout by pat


Modified:
    
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Modified: 
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1629072&r1=1629071&r2=1629072&view=diff
==============================================================================
--- 
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
 (original)
+++ 
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
 Thu Oct  2 21:23:39 2014
@@ -321,34 +321,42 @@ Indicators come in 3 types
 The query for recommendations will be a mix of values meant to match one of 
your indicators. The query can be constructed 
 from user history and values derived from context (category being viewed for 
instance) or special precalculated data 
 (popularity rank for instance). This blending of indicators allows for 
creating many flavors or recommendations to fit 
-a very wide variety of circumstances. It allows recommendations to be made for 
items with no usage data and even allows 
-for gracefully degrading recommendations based on how much user history is 
available. 
+a very wide variety of circumstances.
 
 With the right mix of indicators developers can construct a single query that 
works for completely new items and new users 
-while working well for items with lots of interactions and users with many 
recorded actions. In other words adding in content and intrinsic 
-indicators allows developers to create a solution for the "cold-start" problem 
that gracefully improves with more user history
+while working well for items with lots of interactions and users with many 
recorded actions. In other words by adding in content and intrinsic 
+indicators developers can create a solution for the "cold-start" problem that 
gracefully improves with more user history
 and as items have more interactions. It is also possible to create a 
completely content-based recommender that personalizes 
 recommendations.
 
 ##Example with 3 Indicators
 
-You will need to decide how you store user action data so they can be 
processed by the item and row similarity jobs and this is most easily done by 
using text files as described above. The data that is processed by these jobs 
is considered the **training data**. You will need some amount of user history 
in your recs query. It is typical to use the most recent user history but need 
not be exactly what is in the training set, which may include more historical 
data. Keeping the user history for query purposes could be done with a database 
by referencing some history from a users table. In the example above the two 
collaborative filtering actions are "purchase" and "view", but let's also add 
tags (taken from catalog categories or other descriptive metadata). 
+You will need to decide how you store user action data so they can be 
processed by the item and row similarity jobs and 
+this is most easily done by using text files as described above. The data that 
is processed by these jobs is considered the 
+training data. You will need some amount of user history in your recs query. 
It is typical to use the most recent user history 
+but need not be exactly what is in the training set, which may include a 
greater volume of historical data. Keeping the user 
+history for query purposes could be done with a database by storing it in a 
users table. In the example above the two 
+collaborative filtering actions are "purchase" and "view", but let's also add 
tags (taken from catalog categories or other 
+descriptive metadata). 
+
+We will need to create 1 cooccurrence indicator from the primary action 
(purchase) 1 cross-action cooccurrence indicator 
+from the secondary action (view) 
+and 1 content indicator (tags). We'll have to run *spark-itemsimilarity* once 
and *spark-rowsimilarity* once.
 
-We will need to create 1 indicator from the primary action (purchase) 1 
cross-indicator from the secondary action (view) and 1 content-indicator for 
(tags). We'll have to run *spark-itemsimilarity* once and *spark-rowsimilarity* 
once.
-
-We have described how to create the indicator and cross-indicator for purchase 
and view (the [How to use Multiple User 
+We have described how to create the collaborative filtering indicator and 
cross-indicator for purchase and view (the [How to use Multiple User 
 Actions](#multiple-actions) section) but tags will be a slightly different 
process. We want to use the fact that 
 certain items have tags similar to the ones associated with a user's 
purchases. This is not a collaborative filtering indicator 
-but rather a "content" or "metadata" type indicator since you are not using 
other users' tag viewing history, only the 
+but rather a "content" or "metadata" type indicator since you are not using 
other users' history, only the 
 individual that you are making recs for. This means that this method will make 
recommendations for items that have 
 no collaborative filtering data, as happens with new items in a catalog. New 
items may have tags assigned but no one
- has purchased or viewed them yet. 
-
-We could have treated viewing tags as a collaborative filtering 
cross-indicator by recording other users tag viewing history and that would 
probably give better results but here we are trying to illustrate recommending 
without CF data and using content-indicators. In the final query we will mix 
all 3 indicators.
+ has purchased or viewed them yet. In the final query we will mix all 3 
indicators.
 
 ##Content Indicator
 
-To create a content-indicator we'll make use of the fact that the user has 
purchased items with certain tags. We want to find items with the most similar 
tags. Notice that other users' behavior is not considered--only other item's 
tags. This defines a content or metadata indicator. They are used when you want 
to find items that are similar to other items by using their content or 
metadata, not by which users interacted with them.
+To create a content-indicator we'll make use of the fact that the user has 
purchased items with certain tags. We want to find 
+items with the most similar tags. Notice that other users' behavior is not 
considered--only other item's tags. This defines a 
+content or metadata indicator. They are used when you want to find items that 
are similar to other items by using their 
+content or metadata, not by which users interacted with them.
 
 For this we need input of the form:
 
@@ -361,7 +369,10 @@ The full collection will look like the t
     9446577d<tab>women tops chambray clothing casual
     ...
 
-We'll use *spark-rowimilairity* because we are looking for similar rows, which 
encode items in this case. As with the indicator and cross-indicator we use the 
--omitStrength option. The strengths created are probabilistic log-likelihood 
ratios and so are used to filter unimportant similarities. Once the filtering 
or downsampling are finished we no longer need the strengths. We will get an 
indicator matrix of the form:
+We'll use *spark-rowimilairity* because we are looking for similar rows, which 
encode items in this case. As with the 
+collaborative filtering indicator and cross-indicator we use the 
--omitStrength option. The strengths created are 
+probabilistic log-likelihood ratios and so are used to filter unimportant 
similarities. Once the filtering or downsampling 
+is finished we no longer need the strengths. We will get an indicator matrix 
of the form:
 
     itemID<tab>list-of-item IDs
     ...
@@ -372,23 +383,23 @@ This is a content indicator since it has
     9446577d<tab>9446577d 9496577d 0943577d 8346577d 9442277d 9446577e
     ...  
     
-We now have three indicators, two collaborative filtering type and one content 
type. Notice that purchase, view, and tags can all be recorded for users and so 
can be used in a recommendations query.
+We now have three indicators, two collaborative filtering type and one content 
type.
 
 ##Unified Recommender Query
 
 The actual form of the query for recommendations will vary depending on your 
search engine but the intent is the same. 
 For a given user, map their history of an action or content to the correct 
indicator field and perform an OR'd query. 
-This will allow matches from any indicator where AND queries require that an 
item have some similarity to all indicator 
-fields.
 
-We have 3 indicators, these are indexed by the search engine into 3 fields, 
we'll call them "purchase", "view", and "tags". We take the user's history that 
corresponds to each indicator and create a query of the form:
+We have 3 indicators, these are indexed by the search engine into 3 fields, 
we'll call them "purchase", "view", and "tags". 
+We take the user's history that corresponds to each indicator and create a 
query of the form:
 
     Query:
       field: purchase; q:user's-purchase-history
       field: view; q:user's view-history
       field: tags; q:user's-tags-associated-with-purchases
       
-The query will result in an ordered list of items recommended for purchase but 
skewed towards items with similar tags to the ones the user has already 
purchased. 
+The query will result in an ordered list of items recommended for purchase but 
skewed towards items with similar tags to 
+the ones the user has already purchased. 
 
 This is only an example and not necessarily the optimal way to create recs. It 
illustrates how business decisions can be 
 translated into recommendations. This technique can be used to skew 
recommendations towards intrinsic indicators also.

svn commit: r1629072 - /mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Reply via email to