Author: pat
Date: Thu Oct 2 21:23:39 2014
New Revision: 1629072
URL: http://svn.apache.org/r1629072
Log:
CMS commit to mahout by pat
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1629072&r1=1629071&r2=1629072&view=diff
==============================================================================
---
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
(original)
+++
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Thu Oct 2 21:23:39 2014
@@ -321,34 +321,42 @@ Indicators come in 3 types
The query for recommendations will be a mix of values meant to match one of
your indicators. The query can be constructed
from user history and values derived from context (category being viewed for
instance) or special precalculated data
(popularity rank for instance). This blending of indicators allows for
creating many flavors or recommendations to fit
-a very wide variety of circumstances. It allows recommendations to be made for
items with no usage data and even allows
-for gracefully degrading recommendations based on how much user history is
available.
+a very wide variety of circumstances.
With the right mix of indicators developers can construct a single query that
works for completely new items and new users
-while working well for items with lots of interactions and users with many
recorded actions. In other words adding in content and intrinsic
-indicators allows developers to create a solution for the "cold-start" problem
that gracefully improves with more user history
+while working well for items with lots of interactions and users with many
recorded actions. In other words by adding in content and intrinsic
+indicators developers can create a solution for the "cold-start" problem that
gracefully improves with more user history
and as items have more interactions. It is also possible to create a
completely content-based recommender that personalizes
recommendations.
##Example with 3 Indicators
-You will need to decide how you store user action data so they can be
processed by the item and row similarity jobs and this is most easily done by
using text files as described above. The data that is processed by these jobs
is considered the **training data**. You will need some amount of user history
in your recs query. It is typical to use the most recent user history but need
not be exactly what is in the training set, which may include more historical
data. Keeping the user history for query purposes could be done with a database
by referencing some history from a users table. In the example above the two
collaborative filtering actions are "purchase" and "view", but let's also add
tags (taken from catalog categories or other descriptive metadata).
+You will need to decide how you store user action data so they can be
processed by the item and row similarity jobs and
+this is most easily done by using text files as described above. The data that
is processed by these jobs is considered the
+training data. You will need some amount of user history in your recs query.
It is typical to use the most recent user history
+but need not be exactly what is in the training set, which may include a
greater volume of historical data. Keeping the user
+history for query purposes could be done with a database by storing it in a
users table. In the example above the two
+collaborative filtering actions are "purchase" and "view", but let's also add
tags (taken from catalog categories or other
+descriptive metadata).
+
+We will need to create 1 cooccurrence indicator from the primary action
(purchase) 1 cross-action cooccurrence indicator
+from the secondary action (view)
+and 1 content indicator (tags). We'll have to run *spark-itemsimilarity* once
and *spark-rowsimilarity* once.
-We will need to create 1 indicator from the primary action (purchase) 1
cross-indicator from the secondary action (view) and 1 content-indicator for
(tags). We'll have to run *spark-itemsimilarity* once and *spark-rowsimilarity*
once.
-
-We have described how to create the indicator and cross-indicator for purchase
and view (the [How to use Multiple User
+We have described how to create the collaborative filtering indicator and
cross-indicator for purchase and view (the [How to use Multiple User
Actions](#multiple-actions) section) but tags will be a slightly different
process. We want to use the fact that
certain items have tags similar to the ones associated with a user's
purchases. This is not a collaborative filtering indicator
-but rather a "content" or "metadata" type indicator since you are not using
other users' tag viewing history, only the
+but rather a "content" or "metadata" type indicator since you are not using
other users' history, only the
individual that you are making recs for. This means that this method will make
recommendations for items that have
no collaborative filtering data, as happens with new items in a catalog. New
items may have tags assigned but no one
- has purchased or viewed them yet.
-
-We could have treated viewing tags as a collaborative filtering
cross-indicator by recording other users tag viewing history and that would
probably give better results but here we are trying to illustrate recommending
without CF data and using content-indicators. In the final query we will mix
all 3 indicators.
+ has purchased or viewed them yet. In the final query we will mix all 3
indicators.
##Content Indicator
-To create a content-indicator we'll make use of the fact that the user has
purchased items with certain tags. We want to find items with the most similar
tags. Notice that other users' behavior is not considered--only other item's
tags. This defines a content or metadata indicator. They are used when you want
to find items that are similar to other items by using their content or
metadata, not by which users interacted with them.
+To create a content-indicator we'll make use of the fact that the user has
purchased items with certain tags. We want to find
+items with the most similar tags. Notice that other users' behavior is not
considered--only other item's tags. This defines a
+content or metadata indicator. They are used when you want to find items that
are similar to other items by using their
+content or metadata, not by which users interacted with them.
For this we need input of the form:
@@ -361,7 +369,10 @@ The full collection will look like the t
9446577d<tab>women tops chambray clothing casual
...
-We'll use *spark-rowimilairity* because we are looking for similar rows, which
encode items in this case. As with the indicator and cross-indicator we use the
--omitStrength option. The strengths created are probabilistic log-likelihood
ratios and so are used to filter unimportant similarities. Once the filtering
or downsampling are finished we no longer need the strengths. We will get an
indicator matrix of the form:
+We'll use *spark-rowimilairity* because we are looking for similar rows, which
encode items in this case. As with the
+collaborative filtering indicator and cross-indicator we use the
--omitStrength option. The strengths created are
+probabilistic log-likelihood ratios and so are used to filter unimportant
similarities. Once the filtering or downsampling
+is finished we no longer need the strengths. We will get an indicator matrix
of the form:
itemID<tab>list-of-item IDs
...
@@ -372,23 +383,23 @@ This is a content indicator since it has
9446577d<tab>9446577d 9496577d 0943577d 8346577d 9442277d 9446577e
...
-We now have three indicators, two collaborative filtering type and one content
type. Notice that purchase, view, and tags can all be recorded for users and so
can be used in a recommendations query.
+We now have three indicators, two collaborative filtering type and one content
type.
##Unified Recommender Query
The actual form of the query for recommendations will vary depending on your
search engine but the intent is the same.
For a given user, map their history of an action or content to the correct
indicator field and perform an OR'd query.
-This will allow matches from any indicator where AND queries require that an
item have some similarity to all indicator
-fields.
-We have 3 indicators, these are indexed by the search engine into 3 fields,
we'll call them "purchase", "view", and "tags". We take the user's history that
corresponds to each indicator and create a query of the form:
+We have 3 indicators, these are indexed by the search engine into 3 fields,
we'll call them "purchase", "view", and "tags".
+We take the user's history that corresponds to each indicator and create a
query of the form:
Query:
field: purchase; q:user's-purchase-history
field: view; q:user's view-history
field: tags; q:user's-tags-associated-with-purchases
-The query will result in an ordered list of items recommended for purchase but
skewed towards items with similar tags to the ones the user has already
purchased.
+The query will result in an ordered list of items recommended for purchase but
skewed towards items with similar tags to
+the ones the user has already purchased.
This is only an example and not necessarily the optimal way to create recs. It
illustrates how business decisions can be
translated into recommendations. This technique can be used to skew
recommendations towards intrinsic indicators also.