Re: Lambda and Kappa CCO

2017-03-27 Thread Pat Ferrel
Agreed. Downsampling was ignored in several places and with it a great deal of 
input is a noop. Without downsampling too many things need to change. 

Also everything is dependent on this rather vague sentence. “- determine if the 
new interaction element cross-occurs with A and if so calculate the llr score”, 
which needs a lot more explanation. Whether to use Mahout in-memory objects or 
reimplement some in high speed data structures is a big question.

The good thing I noticed in writing this is that model update and real time can 
be arbitrarily far apart, that the system degrades gracefully. So during high 
load it may fall behind but as long as user behavior is up-to-date and 
persisted (it will be) we are still in pretty good shape.


On Mar 26, 2017, at 6:26 PM, Ted Dunning  wrote:


I think that this analysis omits the fact that one user interaction causes many 
cooccurrences to change.

This becomes feasible if you include the effect of down-sampling, but that has 
to be in the algorithm.


From: Pat Ferrel 
Sent: Saturday, March 25, 2017 12:01:00 PM
To: Trevor Grant; user@mahout.apache.org
Cc: Ted Dunning; s...@apache.org
Subject: Lambda and Kappa CCO
 
This is an overview and proposal for turning the multi-modal Correlated 
Cross-Occurrence (CCO) recommender from Lambda-style into an online streaming 
incrementally updated Kappa-style learner.

# The CCO Recommender: Lambda-style

We have largely solved the problems of calculating the multi-modal Correlated 
Cross-Occurrence models and serving recommendations in real time from real time 
user behavior. The model sits in Lucene (Elasticsearch or Solr) in a scalable 
way and the typical query to produce personalized recommendations comes from 
real time user behavior completes with 25ms latency.

# CCO Algorithm

A = rows are users, columns are items they have “converted” on (purchase, read, 
watch). A represents the conversion event—the interaction that you want to 
recommend.
B = rows are users columns are items that the user has shown some preference 
for but not necessarily the same items as A. B represent a different 
interaction than A. B might be a preference for some category, brand, genre, or 
just a detailed item page view—or all of these in B, C, D, etc
h_a = a particular user’s history of A type interactions, a vector of items 
that our user converted on.
h_b = a particular user’s history of B type interactions, a vector of items 
that our user had B type interactions with.

CCO says:

[A’A]h_a + [A’B]h_b + [A’C]h_c = r; where r is the weighted items from A that 
represent personalized recommendations for our particular user.

The innovation here is that A, B, C, … represent multi-modal data. Interactions 
of all types and on item-sets of arbitrary types. In other words we can look at 
virtually any action or possible indicator of user preference or taste. We 
strengthen the above raw cross-occurrence and cooccurrence formula by 
performing:

[llr(A’A)]h_a + [llr(A’B)]h_b + … = r adding llr (log-likelihood ratio) 
correlation scoring to filter out coincidental cross-occurrences.

The model becomes [llr(A’A)], [llr(A’B)], … each has items from A in rows and 
items from A, B, … in columns. This sits in Lucene as one document per items in 
A with a field for each of A, B, C items whose user interactions most strongly 
correlate to the conversion event on the row item. Put another way, the model 
is items from A. B, C… what have the most similar user interaction from users.

To calculate r we need to find the most simllar items in the model to the 
history or behavior of our example user. Since Lucene is basically a K-Nearest 
Neighbors engine that is particularly well tuned to work with sparse data (our 
model is typically quite sparse) all we need to do is segment the user history 
into h_a, h_b … and use it as the multi-field query on the model. This performs 
the equivalent of:

[llr(A’A)]h_a + [llr(A’B)]h_b + … = r where we substitute cosine similarity of 
h_a to every row in [llr(A’A)]h_a for the tensor math. Further Lucene sorts by 
score and returns only the top ranking items. Even further we note that since 
we have performed a multi-field query it does the entire multi-field similarity 
calculation and vector segment addition before doing the sort. Lucene does this 
a a very performant manner so the entire query, including fetching user 
history, forming the Lucene query and executing it will take something like 25 
ms and is indefinitely scalable to any number of simultaneous queries.

Problem solved?

Well, yes and no. The above method I’ve label a Lambda-style recommender. It 
uses real time user history and makes recommendations in real time but it can 
only recommend items in A. So if A is changing rapidly, as when the items have 
short lifetimes like newsy items of social media things like tweets then A can 
get out of date in hours or minutes. The other downside of Lambda CCO is that 
we note that the entirety of the dat

Re: Lambda and Kappa CCO

2017-04-09 Thread Andrew Palumbo
Pat-

What can we do from the mahout side?  Would we need any new data structures?  
Trevor and I were just discussing some of  the troubles of near real time 
matrix streaming.


From: Pat Ferrel 
Sent: Monday, March 27, 2017 2:42:55 PM
To: Ted Dunning; user@mahout.apache.org
Cc: Trevor Grant; Ted Dunning; s...@apache.org
Subject: Re: Lambda and Kappa CCO

Agreed. Downsampling was ignored in several places and with it a great deal of 
input is a noop. Without downsampling too many things need to change.

Also everything is dependent on this rather vague sentence. “- determine if the 
new interaction element cross-occurs with A and if so calculate the llr score”, 
which needs a lot more explanation. Whether to use Mahout in-memory objects or 
reimplement some in high speed data structures is a big question.

The good thing I noticed in writing this is that model update and real time can 
be arbitrarily far apart, that the system degrades gracefully. So during high 
load it may fall behind but as long as user behavior is up-to-date and 
persisted (it will be) we are still in pretty good shape.


On Mar 26, 2017, at 6:26 PM, Ted Dunning  wrote:


I think that this analysis omits the fact that one user interaction causes many 
cooccurrences to change.

This becomes feasible if you include the effect of down-sampling, but that has 
to be in the algorithm.


From: Pat Ferrel 
Sent: Saturday, March 25, 2017 12:01:00 PM
To: Trevor Grant; user@mahout.apache.org
Cc: Ted Dunning; s...@apache.org
Subject: Lambda and Kappa CCO

This is an overview and proposal for turning the multi-modal Correlated 
Cross-Occurrence (CCO) recommender from Lambda-style into an online streaming 
incrementally updated Kappa-style learner.

# The CCO Recommender: Lambda-style

We have largely solved the problems of calculating the multi-modal Correlated 
Cross-Occurrence models and serving recommendations in real time from real time 
user behavior. The model sits in Lucene (Elasticsearch or Solr) in a scalable 
way and the typical query to produce personalized recommendations comes from 
real time user behavior completes with 25ms latency.

# CCO Algorithm

A = rows are users, columns are items they have “converted” on (purchase, read, 
watch). A represents the conversion event—the interaction that you want to 
recommend.
B = rows are users columns are items that the user has shown some preference 
for but not necessarily the same items as A. B represent a different 
interaction than A. B might be a preference for some category, brand, genre, or 
just a detailed item page view—or all of these in B, C, D, etc
h_a = a particular user’s history of A type interactions, a vector of items 
that our user converted on.
h_b = a particular user’s history of B type interactions, a vector of items 
that our user had B type interactions with.

CCO says:

[A’A]h_a + [A’B]h_b + [A’C]h_c = r; where r is the weighted items from A that 
represent personalized recommendations for our particular user.

The innovation here is that A, B, C, … represent multi-modal data. Interactions 
of all types and on item-sets of arbitrary types. In other words we can look at 
virtually any action or possible indicator of user preference or taste. We 
strengthen the above raw cross-occurrence and cooccurrence formula by 
performing:

[llr(A’A)]h_a + [llr(A’B)]h_b + … = r adding llr (log-likelihood ratio) 
correlation scoring to filter out coincidental cross-occurrences.

The model becomes [llr(A’A)], [llr(A’B)], … each has items from A in rows and 
items from A, B, … in columns. This sits in Lucene as one document per items in 
A with a field for each of A, B, C items whose user interactions most strongly 
correlate to the conversion event on the row item. Put another way, the model 
is items from A. B, C… what have the most similar user interaction from users.

To calculate r we need to find the most simllar items in the model to the 
history or behavior of our example user. Since Lucene is basically a K-Nearest 
Neighbors engine that is particularly well tuned to work with sparse data (our 
model is typically quite sparse) all we need to do is segment the user history 
into h_a, h_b … and use it as the multi-field query on the model. This performs 
the equivalent of:

[llr(A’A)]h_a + [llr(A’B)]h_b + … = r where we substitute cosine similarity of 
h_a to every row in [llr(A’A)]h_a for the tensor math. Further Lucene sorts by 
score and returns only the top ranking items. Even further we note that since 
we have performed a multi-field query it does the entire multi-field similarity 
calculation and vector segment addition before doing the sort. Lucene does this 
a a very performant manner so the entire query, including fetching user 
history, forming the Lucene query and executing it will take something like 25 
ms and is indefinitely scalable to any number of simultaneous queries.

Problem solved?

Well, yes and no. The above method I’ve

Re: Lambda and Kappa CCO

2017-04-09 Thread Trevor Grant
Specifically, I hacked together a Lambda Streaming CCO with Spark and Flink
for a demo for my upcoming FlinkForward talk.  Will post code once I finish
it / strip all my creds out. In short- the lack of serialization in Mahout
incore vectors/matrices makes handing off / dealing with them somewhat
tedious.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sun, Apr 9, 2017 at 5:39 PM, Andrew Palumbo  wrote:

> Pat-
>
> What can we do from the mahout side?  Would we need any new data
> structures?  Trevor and I were just discussing some of  the troubles of
> near real time matrix streaming.
> --
> *From:* Pat Ferrel 
> *Sent:* Monday, March 27, 2017 2:42:55 PM
> *To:* Ted Dunning; user@mahout.apache.org
> *Cc:* Trevor Grant; Ted Dunning; s...@apache.org
> *Subject:* Re: Lambda and Kappa CCO
>
> Agreed. Downsampling was ignored in several places and with it a great
> deal of input is a noop. Without downsampling too many things need to
> change.
>
> Also everything is dependent on this rather vague sentence. “- determine
> if the new interaction element cross-occurs with A and if so calculate the
> llr score”, which needs a lot more explanation. Whether to use Mahout
> in-memory objects or reimplement some in high speed data structures is a
> big question.
>
> The good thing I noticed in writing this is that model update and real
> time can be arbitrarily far apart, that the system degrades gracefully. So
> during high load it may fall behind but as long as user behavior is
> up-to-date and persisted (it will be) we are still in pretty good shape.
>
>
> On Mar 26, 2017, at 6:26 PM, Ted Dunning  wrote:
>
>
> I think that this analysis omits the fact that one user interaction causes
> many cooccurrences to change.
>
> This becomes feasible if you include the effect of down-sampling, but that
> has to be in the algorithm.
>
>
> From: Pat Ferrel 
> Sent: Saturday, March 25, 2017 12:01:00 PM
> To: Trevor Grant; user@mahout.apache.org
> Cc: Ted Dunning; s...@apache.org
> Subject: Lambda and Kappa CCO
>
> This is an overview and proposal for turning the multi-modal Correlated
> Cross-Occurrence (CCO) recommender from Lambda-style into an online
> streaming incrementally updated Kappa-style learner.
>
> # The CCO Recommender: Lambda-style
>
> We have largely solved the problems of calculating the multi-modal
> Correlated Cross-Occurrence models and serving recommendations in real time
> from real time user behavior. The model sits in Lucene (Elasticsearch or
> Solr) in a scalable way and the typical query to produce personalized
> recommendations comes from real time user behavior completes with 25ms
> latency.
>
> # CCO Algorithm
>
> A = rows are users, columns are items they have “converted” on (purchase,
> read, watch). A represents the conversion event—the interaction that you
> want to recommend.
> B = rows are users columns are items that the user has shown some
> preference for but not necessarily the same items as A. B represent a
> different interaction than A. B might be a preference for some category,
> brand, genre, or just a detailed item page view—or all of these in B, C, D,
> etc
> h_a = a particular user’s history of A type interactions, a vector of
> items that our user converted on.
> h_b = a particular user’s history of B type interactions, a vector of
> items that our user had B type interactions with.
>
> CCO says:
>
> [A’A]h_a + [A’B]h_b + [A’C]h_c = r; where r is the weighted items from A
> that represent personalized recommendations for our particular user.
>
> The innovation here is that A, B, C, … represent multi-modal data.
> Interactions of all types and on item-sets of arbitrary types. In other
> words we can look at virtually any action or possible indicator of user
> preference or taste. We strengthen the above raw cross-occurrence and
> cooccurrence formula by performing:
>
> [llr(A’A)]h_a + [llr(A’B)]h_b + … = r adding llr (log-likelihood ratio)
> correlation scoring to filter out coincidental cross-occurrences.
>
> The model becomes [llr(A’A)], [llr(A’B)], … each has items from A in rows
> and items from A, B, … in columns. This sits in Lucene as one document per
> items in A with a field for each of A, B, C items whose user interactions
> most strongly correlate to the conversion event on the row item. Put
> another way, the model is items from A. B, C… what have the most similar
> user interaction from users.
>
> To calculate r we need to find the most simllar items in the model to the
> history or beh

Re: Lambda and Kappa CCO

2017-04-17 Thread Pat Ferrel
Ted thinks this can be done with DBs alone. What I proposed was in DBs like 
Solr/Elasticsearch and a persistent event cache (HBase, Cassandra, etc) but 
in-memory models for faster indicator calculations leading to mutable model 
updates in ES/Solr. One primary reason for kappa over lambda is items with 
short life spans or rapidly changing catalogs—things like news. 

The other point for online learning is the volume of data that must be stored 
and re-processed. Kappa only deals with small incremental changes. The resource 
cost of kappa will be much smaller than lambda especially for slowly changing 
models, where most updates will be no-ops.

In any case in kappa there would be no matrix of vector multiply explicitly. If 
we do in-memory data structures I doubt it would be Mahout ones.




On Apr 9, 2017, at 3:43 PM, Trevor Grant  wrote:

Specifically, I hacked together a Lambda Streaming CCO with Spark and Flink
for a demo for my upcoming FlinkForward talk.  Will post code once I finish
it / strip all my creds out. In short- the lack of serialization in Mahout
incore vectors/matrices makes handing off / dealing with them somewhat
tedious.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sun, Apr 9, 2017 at 5:39 PM, Andrew Palumbo  wrote:

> Pat-
> 
> What can we do from the mahout side?  Would we need any new data
> structures?  Trevor and I were just discussing some of  the troubles of
> near real time matrix streaming.
> --
> *From:* Pat Ferrel 
> *Sent:* Monday, March 27, 2017 2:42:55 PM
> *To:* Ted Dunning; user@mahout.apache.org
> *Cc:* Trevor Grant; Ted Dunning; s...@apache.org
> *Subject:* Re: Lambda and Kappa CCO
> 
> Agreed. Downsampling was ignored in several places and with it a great
> deal of input is a noop. Without downsampling too many things need to
> change.
> 
> Also everything is dependent on this rather vague sentence. “- determine
> if the new interaction element cross-occurs with A and if so calculate the
> llr score”, which needs a lot more explanation. Whether to use Mahout
> in-memory objects or reimplement some in high speed data structures is a
> big question.
> 
> The good thing I noticed in writing this is that model update and real
> time can be arbitrarily far apart, that the system degrades gracefully. So
> during high load it may fall behind but as long as user behavior is
> up-to-date and persisted (it will be) we are still in pretty good shape.
> 
> 
> On Mar 26, 2017, at 6:26 PM, Ted Dunning  wrote:
> 
> 
> I think that this analysis omits the fact that one user interaction causes
> many cooccurrences to change.
> 
> This becomes feasible if you include the effect of down-sampling, but that
> has to be in the algorithm.
> 
> 
> From: Pat Ferrel 
> Sent: Saturday, March 25, 2017 12:01:00 PM
> To: Trevor Grant; user@mahout.apache.org
> Cc: Ted Dunning; s...@apache.org
> Subject: Lambda and Kappa CCO
> 
> This is an overview and proposal for turning the multi-modal Correlated
> Cross-Occurrence (CCO) recommender from Lambda-style into an online
> streaming incrementally updated Kappa-style learner.
> 
> # The CCO Recommender: Lambda-style
> 
> We have largely solved the problems of calculating the multi-modal
> Correlated Cross-Occurrence models and serving recommendations in real time
> from real time user behavior. The model sits in Lucene (Elasticsearch or
> Solr) in a scalable way and the typical query to produce personalized
> recommendations comes from real time user behavior completes with 25ms
> latency.
> 
> # CCO Algorithm
> 
> A = rows are users, columns are items they have “converted” on (purchase,
> read, watch). A represents the conversion event—the interaction that you
> want to recommend.
> B = rows are users columns are items that the user has shown some
> preference for but not necessarily the same items as A. B represent a
> different interaction than A. B might be a preference for some category,
> brand, genre, or just a detailed item page view—or all of these in B, C, D,
> etc
> h_a = a particular user’s history of A type interactions, a vector of
> items that our user converted on.
> h_b = a particular user’s history of B type interactions, a vector of
> items that our user had B type interactions with.
> 
> CCO says:
> 
> [A’A]h_a + [A’B]h_b + [A’C]h_c = r; where r is the weighted items from A
> that represent personalized recommendations for our particular user.
> 
> The innovation here is that A, B, C, … represent multi-modal data.
> Interactions of all types and on item-sets of arbitrary types. In other
> w