Re: Clustering product views and sales

2013-05-07 Thread Pat Ferrel
You always will have a cold start problem for a subset of users--the new ones 
to a site. Popularity doesn't always work either. Sometimes you have a flat 
purchase frequency distribution, as I've seen. In these cases a metadata or 
content based recommender is nice to fill in. If you have no metadata you still 
have items similarities (based on older users purchases and views).

I think one important thing to think about is that you don't always need to 
have recommendations based on the user's history. You may find that you get 
better results by using item similarity based recommendations. So on an item 
page you can show recommendations with the above techniques in a wide variety 
of situations.

On another subject looking at the predictive power of views (for purchases) and 
purchases (for purchases) you will likely find views a weak predictor. I think 
what Ted is talking about below is a technique for using a co-occurrence matrix 
to find views that lead to purchases. To use this you would build two models, 
one from purchases and one from the co-ocurrence of views with purchases. Then 
you will need to combine the weights of recommendations from both models for a 
given user history OR similarities for a given item.

The conversation Johannes sites below has some details 
http://markmail.org/message/5cfewal3oyt6vw2k

I have a working cross-recommender made for using views and purchases. The next 
question is how how to measure its performance. There are ways to simulate the 
view-purchase data and other uses for the cross-recommender technique. But 
having a real view and purchase dataset would be incredibly useful! I keep 
begging people on this list...

Can you share your data? If so I'd be happy to share the code (actually I'll 
put it on github eventually).


On May 6, 2013, at 9:40 PM, Johannes Schulte johannes.schu...@gmail.com wrote:

Hi!
As a starting point I remember this conversation containing both elements
(although the reconstruction part is rather small, hint!)

http://markmail.org/message/5cfewal3oyt6vw2k


On Tue, May 7, 2013 at 1:00 AM, Dominik Hübner cont...@dhuebner.com wrote:

 One more thing for now @Ted:
 What do you refer to with sparsification and reconstruction?
 
 On May 7, 2013, at 12:19 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 Truly cold start is best handled by recommending the most popular items.
 
 If you know *anything* at all such as geo or browser or OS, then you can
 use that to recommend using conventional techniques (that is, you can
 recommend for the characteristics rather than for the person).
 
 Within a very few interactions, however, real recommendations will kick
 in.
 
 My lately preferred approach is to derive indicators using sparsification
 or ALS+reconstruction.  These indicators can be historical items or
 static
 items such as geo information.  These indicators can be combined in a
 single step using a search engine.
 
 
 
 
 
 
 On Mon, May 6, 2013 at 2:58 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 The cluster was mostly intended for tackling the cold start problem for
 new users.
 I want to build a recommender based on existing components or to be
 precise a combination of them.
 
 Unfortunately, the only product meta-data I currently have is the
 product
 price. Furthermore, this is a project
 I am working on alone. As a consequence, the approaches I can examine in
 the given time are limited.
 
 Would using ALS and ranking its outcome by e.g. frequent item set
 algorithms be something worth looking into?
 Or did you mean something different?
 
 My personal goal is to build a recommender providing acceptable results
 using the data I currently have available.
 Of course, this will only serve as a basis for further improvements
 where
 necessary or if further information can be obtained.
 
 
 On May 6, 2013, at 11:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 Are you looking to build a product recommender based on your own
 design?
 Or do you want to build one based on existing methods?
 
 If you want to use existing methods, clustering has essentially no
 role.
 
 I think that composite approaches that use item meta-data and different
 kinds of behavioral cues are important to best performance.
 
 
 On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 Well, as you already might have guessed, I am building a product
 recommender system for my thesis.
 
 I am planning to evaluate ALS (both, implicit and explicit) as well as
 item -similarity recommendation for users with at least a few known
 products. Nevertheless, the majority of users only has seen a single
 (or
 2-3) product(s). I want to recommend them the most popular items from
 clusters, their only product comes from (as a workaround for the
 cold-start
 problem). Furthermore, I expect to be able to see which kind of
 products
 users like. This might provide me some information about how well ALS
 and
 similarity recommenders fit the user's 

Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
And running the clustering on the cooccurrence matrix or doing PCA by removing 
eigenvalues/vectors?

On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner cont...@dhuebner.comwrote:
 
 Oh, and I forgot how the views and sales are used to build product
 vectors. As of now, I implemented binary vectors, vectors counting the
 number of views and sales (e.g 1view=1count, 1sale=10counts) and ordinary
 vectors ( view = 1, sale=5).
 
 
 I would recommend just putting the view and sale in different columns and
 doing cooccurrence analysis on this.



Re: Clustering product views and sales

2013-05-06 Thread Ted Dunning
I don't even think that clustering is all that necessary.

The reduced cooccurrence matrix will give you items related to each item.

You can use something like PCA, but SVD is just as good here due to near
zero mean.  You could SSVD or ALS from Mahout to do this analysis and then
use k-means on the right singular vectors (aka item representation).

What is the high level goal that you are trying to solve with this
clustering?




On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner cont...@dhuebner.comwrote:

 And running the clustering on the cooccurrence matrix or doing PCA by
 removing eigenvalues/vectors?

 On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
  Oh, and I forgot how the views and sales are used to build product
  vectors. As of now, I implemented binary vectors, vectors counting the
  number of views and sales (e.g 1view=1count, 1sale=10counts) and
 ordinary
  vectors ( view = 1, sale=5).
 
 
  I would recommend just putting the view and sale in different columns and
  doing cooccurrence analysis on this.




Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
Well, as you already might have guessed, I am building a product recommender 
system for my thesis. 

I am planning to evaluate ALS (both, implicit and explicit) as well as item 
-similarity recommendation for users with at least a few known products. 
Nevertheless, the majority of users only has seen a single (or 2-3) product(s). 
I want to recommend them the most popular items from clusters, their only 
product comes from (as a workaround for the cold-start problem). Furthermore, I 
expect to be able to see which kind of products users like. This might 
provide me some information about how well ALS and similarity recommenders fit 
the user's area of interest (an early evaluation) or at least to estimate if 
the chosen approach will work in some way.

On May 6, 2013, at 9:09 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 I don't even think that clustering is all that necessary.
 
 The reduced cooccurrence matrix will give you items related to each item.
 
 You can use something like PCA, but SVD is just as good here due to near
 zero mean.  You could SSVD or ALS from Mahout to do this analysis and then
 use k-means on the right singular vectors (aka item representation).
 
 What is the high level goal that you are trying to solve with this
 clustering?
 
 
 
 
 On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner cont...@dhuebner.comwrote:
 
 And running the clustering on the cooccurrence matrix or doing PCA by
 removing eigenvalues/vectors?
 
 On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 Oh, and I forgot how the views and sales are used to build product
 vectors. As of now, I implemented binary vectors, vectors counting the
 number of views and sales (e.g 1view=1count, 1sale=10counts) and
 ordinary
 vectors ( view = 1, sale=5).
 
 
 I would recommend just putting the view and sale in different columns and
 doing cooccurrence analysis on this.
 
 



Re: Clustering product views and sales

2013-05-06 Thread Koobas
Since Dominik mentioned item-based and ALS, let me throw in a question here.
I believe that one of the Netflix price solutions combined KNN and ALS.

1) What is the best way to combine the results of both?
2) Is there really merit to this approach?
3) Are there other combinations that make sense?
(user-based + item-based)?


On Mon, May 6, 2013 at 3:35 PM, Dominik Hübner cont...@dhuebner.com wrote:

 Well, as you already might have guessed, I am building a product
 recommender system for my thesis.

 I am planning to evaluate ALS (both, implicit and explicit) as well as
 item -similarity recommendation for users with at least a few known
 products. Nevertheless, the majority of users only has seen a single (or
 2-3) product(s). I want to recommend them the most popular items from
 clusters, their only product comes from (as a workaround for the cold-start
 problem). Furthermore, I expect to be able to see which kind of products
 users like. This might provide me some information about how well ALS and
 similarity recommenders fit the user's area of interest (an early
 evaluation) or at least to estimate if the chosen approach will work in
 some way.

 On May 6, 2013, at 9:09 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  I don't even think that clustering is all that necessary.
 
  The reduced cooccurrence matrix will give you items related to each item.
 
  You can use something like PCA, but SVD is just as good here due to near
  zero mean.  You could SSVD or ALS from Mahout to do this analysis and
 then
  use k-means on the right singular vectors (aka item representation).
 
  What is the high level goal that you are trying to solve with this
  clustering?
 
 
 
 
  On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
  And running the clustering on the cooccurrence matrix or doing PCA by
  removing eigenvalues/vectors?
 
  On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner cont...@dhuebner.com
  wrote:
 
  Oh, and I forgot how the views and sales are used to build product
  vectors. As of now, I implemented binary vectors, vectors counting the
  number of views and sales (e.g 1view=1count, 1sale=10counts) and
  ordinary
  vectors ( view = 1, sale=5).
 
 
  I would recommend just putting the view and sale in different columns
 and
  doing cooccurrence analysis on this.
 
 




Re: Clustering product views and sales

2013-05-06 Thread Ted Dunning
On Mon, May 6, 2013 at 12:50 PM, Koobas koo...@gmail.com wrote:

 Since Dominik mentioned item-based and ALS, let me throw in a question
 here.
 I believe that one of the Netflix price solutions combined KNN and ALS.

 1) What is the best way to combine the results of both?


I think that combinations are important, but I think that the combination
of very similar kinds of algorithms working on essentially the same data
has almost no practical impact.

2) Is there really merit to this approach?


Yes, but.


 3) Are there other combinations that make sense?
 (user-based + item-based)?


Absolutely.

But I really think that the real mileage for improvement comes from the
following:

a) combining different kinds of behavior into a single recommendation
framework

b) judicious use of dithering to improve exploration

c) substantial UI improvements to gather additional exploratory data for
the recommendation engine

d) principled testing framework for design and algorithmic alternatives

Minor algorithmic changes are almost not visible on the priority list.
 Once you hit pretty-good there are far more important things to take on
in a real rec engine project.


Re: Clustering product views and sales

2013-05-06 Thread Koobas
I think I see the picture now.
Thanks!


On Mon, May 6, 2013 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Mon, May 6, 2013 at 12:50 PM, Koobas koo...@gmail.com wrote:

  Since Dominik mentioned item-based and ALS, let me throw in a question
  here.
  I believe that one of the Netflix price solutions combined KNN and ALS.
 
  1) What is the best way to combine the results of both?
 

 I think that combinations are important, but I think that the combination
 of very similar kinds of algorithms working on essentially the same data
 has almost no practical impact.

 2) Is there really merit to this approach?
 

 Yes, but.


  3) Are there other combinations that make sense?
  (user-based + item-based)?
 

 Absolutely.

 But I really think that the real mileage for improvement comes from the
 following:

 a) combining different kinds of behavior into a single recommendation
 framework

 b) judicious use of dithering to improve exploration

 c) substantial UI improvements to gather additional exploratory data for
 the recommendation engine

 d) principled testing framework for design and algorithmic alternatives

 Minor algorithmic changes are almost not visible on the priority list.
  Once you hit pretty-good there are far more important things to take on
 in a real rec engine project.



Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
The cluster was mostly intended for tackling the cold start problem for new 
users. 
I want to build a recommender based on existing components or to be precise a 
combination of them.

Unfortunately, the only product meta-data I currently have is the product 
price. Furthermore, this is a project
I am working on alone. As a consequence, the approaches I can examine in the 
given time are limited.

Would using ALS and ranking its outcome by e.g. frequent item set algorithms be 
something worth looking into? 
Or did you mean something different? 

My personal goal is to build a recommender providing acceptable results using 
the data I currently have available. 
Of course, this will only serve as a basis for further improvements where 
necessary or if further information can be obtained. 


On May 6, 2013, at 11:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Are you looking to build a product recommender based on your own design?
 Or do you want to build one based on existing methods?
 
 If you want to use existing methods, clustering has essentially no role.
 
 I think that composite approaches that use item meta-data and different
 kinds of behavioral cues are important to best performance.
 
 
 On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner cont...@dhuebner.comwrote:
 
 Well, as you already might have guessed, I am building a product
 recommender system for my thesis.
 
 I am planning to evaluate ALS (both, implicit and explicit) as well as
 item -similarity recommendation for users with at least a few known
 products. Nevertheless, the majority of users only has seen a single (or
 2-3) product(s). I want to recommend them the most popular items from
 clusters, their only product comes from (as a workaround for the cold-start
 problem). Furthermore, I expect to be able to see which kind of products
 users like. This might provide me some information about how well ALS and
 similarity recommenders fit the user's area of interest (an early
 evaluation) or at least to estimate if the chosen approach will work in
 some way.
 
 On May 6, 2013, at 9:09 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 I don't even think that clustering is all that necessary.
 
 The reduced cooccurrence matrix will give you items related to each item.
 
 You can use something like PCA, but SVD is just as good here due to near
 zero mean.  You could SSVD or ALS from Mahout to do this analysis and
 then
 use k-means on the right singular vectors (aka item representation).
 
 What is the high level goal that you are trying to solve with this
 clustering?
 
 
 
 
 On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 And running the clustering on the cooccurrence matrix or doing PCA by
 removing eigenvalues/vectors?
 
 On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 Oh, and I forgot how the views and sales are used to build product
 vectors. As of now, I implemented binary vectors, vectors counting the
 number of views and sales (e.g 1view=1count, 1sale=10counts) and
 ordinary
 vectors ( view = 1, sale=5).
 
 
 I would recommend just putting the view and sale in different columns
 and
 doing cooccurrence analysis on this.
 
 
 
 



Re: Clustering product views and sales

2013-05-06 Thread Sean Owen
It sounds like you don't quite have a cold start problem. You have a
few behaviors, a few views or clicks, not zero. So you really just
need to find an approach that's quite comfortable with sparse input. A
low-rank factorization model like ALS works fine in this case, for
example.

There's a circularity problem in thinking about solving this with
clustering: if you have not enough data to recommend to users at the
start, on what data are you clustering them before that?

I don't think you need clustering either. (Of course, you can cluster
easily from the representation you get out of something like a
low-rank factorization. It can easily be an output rather than an
'input'.)

As to evaluation, it a depends a little on what you mean by frequent
item sets and evaluation. You say a result is good if it occurs
frequently overall with other items the user viewed? It makes some
sense, although it sounds like you're just testing if the recommender
does exactly what a item-similarity-based recommender would do when
based on co-occurrence between items. That is, if that's defined as
the right answer, then save yourself the trouble and build the
recommender to give exactly that answer?

Usually you see if the model recommends back things the user actually
viewed, that were held out of the training data. This has its own
problems but presupposing a correct algorithm isn't one of them.


Re: Clustering product views and sales

2013-05-06 Thread Ted Dunning
Truly cold start is best handled by recommending the most popular items.

If you know *anything* at all such as geo or browser or OS, then you can
use that to recommend using conventional techniques (that is, you can
recommend for the characteristics rather than for the person).

Within a very few interactions, however, real recommendations will kick in.

My lately preferred approach is to derive indicators using sparsification
or ALS+reconstruction.  These indicators can be historical items or static
items such as geo information.  These indicators can be combined in a
single step using a search engine.






On Mon, May 6, 2013 at 2:58 PM, Dominik Hübner cont...@dhuebner.com wrote:

 The cluster was mostly intended for tackling the cold start problem for
 new users.
 I want to build a recommender based on existing components or to be
 precise a combination of them.

 Unfortunately, the only product meta-data I currently have is the product
 price. Furthermore, this is a project
 I am working on alone. As a consequence, the approaches I can examine in
 the given time are limited.

 Would using ALS and ranking its outcome by e.g. frequent item set
 algorithms be something worth looking into?
 Or did you mean something different?

 My personal goal is to build a recommender providing acceptable results
 using the data I currently have available.
 Of course, this will only serve as a basis for further improvements where
 necessary or if further information can be obtained.


 On May 6, 2013, at 11:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  Are you looking to build a product recommender based on your own design?
  Or do you want to build one based on existing methods?
 
  If you want to use existing methods, clustering has essentially no role.
 
  I think that composite approaches that use item meta-data and different
  kinds of behavioral cues are important to best performance.
 
 
  On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
  Well, as you already might have guessed, I am building a product
  recommender system for my thesis.
 
  I am planning to evaluate ALS (both, implicit and explicit) as well as
  item -similarity recommendation for users with at least a few known
  products. Nevertheless, the majority of users only has seen a single (or
  2-3) product(s). I want to recommend them the most popular items from
  clusters, their only product comes from (as a workaround for the
 cold-start
  problem). Furthermore, I expect to be able to see which kind of
 products
  users like. This might provide me some information about how well ALS
 and
  similarity recommenders fit the user's area of interest (an early
  evaluation) or at least to estimate if the chosen approach will work in
  some way.
 
  On May 6, 2013, at 9:09 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  I don't even think that clustering is all that necessary.
 
  The reduced cooccurrence matrix will give you items related to each
 item.
 
  You can use something like PCA, but SVD is just as good here due to
 near
  zero mean.  You could SSVD or ALS from Mahout to do this analysis and
  then
  use k-means on the right singular vectors (aka item representation).
 
  What is the high level goal that you are trying to solve with this
  clustering?
 
 
 
 
  On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner cont...@dhuebner.com
  wrote:
 
  And running the clustering on the cooccurrence matrix or doing PCA by
  removing eigenvalues/vectors?
 
  On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
  On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner 
 cont...@dhuebner.com
  wrote:
 
  Oh, and I forgot how the views and sales are used to build product
  vectors. As of now, I implemented binary vectors, vectors counting
 the
  number of views and sales (e.g 1view=1count, 1sale=10counts) and
  ordinary
  vectors ( view = 1, sale=5).
 
 
  I would recommend just putting the view and sale in different columns
  and
  doing cooccurrence analysis on this.
 
 
 
 




Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
One more thing for now @Ted:
What do you refer to with sparsification and reconstruction?

On May 7, 2013, at 12:19 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Truly cold start is best handled by recommending the most popular items.
 
 If you know *anything* at all such as geo or browser or OS, then you can
 use that to recommend using conventional techniques (that is, you can
 recommend for the characteristics rather than for the person).
 
 Within a very few interactions, however, real recommendations will kick in.
 
 My lately preferred approach is to derive indicators using sparsification
 or ALS+reconstruction.  These indicators can be historical items or static
 items such as geo information.  These indicators can be combined in a
 single step using a search engine.
 
 
 
 
 
 
 On Mon, May 6, 2013 at 2:58 PM, Dominik Hübner cont...@dhuebner.com wrote:
 
 The cluster was mostly intended for tackling the cold start problem for
 new users.
 I want to build a recommender based on existing components or to be
 precise a combination of them.
 
 Unfortunately, the only product meta-data I currently have is the product
 price. Furthermore, this is a project
 I am working on alone. As a consequence, the approaches I can examine in
 the given time are limited.
 
 Would using ALS and ranking its outcome by e.g. frequent item set
 algorithms be something worth looking into?
 Or did you mean something different?
 
 My personal goal is to build a recommender providing acceptable results
 using the data I currently have available.
 Of course, this will only serve as a basis for further improvements where
 necessary or if further information can be obtained.
 
 
 On May 6, 2013, at 11:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 Are you looking to build a product recommender based on your own design?
 Or do you want to build one based on existing methods?
 
 If you want to use existing methods, clustering has essentially no role.
 
 I think that composite approaches that use item meta-data and different
 kinds of behavioral cues are important to best performance.
 
 
 On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 Well, as you already might have guessed, I am building a product
 recommender system for my thesis.
 
 I am planning to evaluate ALS (both, implicit and explicit) as well as
 item -similarity recommendation for users with at least a few known
 products. Nevertheless, the majority of users only has seen a single (or
 2-3) product(s). I want to recommend them the most popular items from
 clusters, their only product comes from (as a workaround for the
 cold-start
 problem). Furthermore, I expect to be able to see which kind of
 products
 users like. This might provide me some information about how well ALS
 and
 similarity recommenders fit the user's area of interest (an early
 evaluation) or at least to estimate if the chosen approach will work in
 some way.
 
 On May 6, 2013, at 9:09 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 I don't even think that clustering is all that necessary.
 
 The reduced cooccurrence matrix will give you items related to each
 item.
 
 You can use something like PCA, but SVD is just as good here due to
 near
 zero mean.  You could SSVD or ALS from Mahout to do this analysis and
 then
 use k-means on the right singular vectors (aka item representation).
 
 What is the high level goal that you are trying to solve with this
 clustering?
 
 
 
 
 On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
 And running the clustering on the cooccurrence matrix or doing PCA by
 removing eigenvalues/vectors?
 
 On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
 On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner 
 cont...@dhuebner.com
 wrote:
 
 Oh, and I forgot how the views and sales are used to build product
 vectors. As of now, I implemented binary vectors, vectors counting
 the
 number of views and sales (e.g 1view=1count, 1sale=10counts) and
 ordinary
 vectors ( view = 1, sale=5).
 
 
 I would recommend just putting the view and sale in different columns
 and
 doing cooccurrence analysis on this.
 
 
 
 
 
 



Re: Clustering product views and sales

2013-05-06 Thread Johannes Schulte
Hi!
As a starting point I remember this conversation containing both elements
(although the reconstruction part is rather small, hint!)

http://markmail.org/message/5cfewal3oyt6vw2k


On Tue, May 7, 2013 at 1:00 AM, Dominik Hübner cont...@dhuebner.com wrote:

 One more thing for now @Ted:
 What do you refer to with sparsification and reconstruction?

 On May 7, 2013, at 12:19 AM, Ted Dunning ted.dunn...@gmail.com wrote:

  Truly cold start is best handled by recommending the most popular items.
 
  If you know *anything* at all such as geo or browser or OS, then you can
  use that to recommend using conventional techniques (that is, you can
  recommend for the characteristics rather than for the person).
 
  Within a very few interactions, however, real recommendations will kick
 in.
 
  My lately preferred approach is to derive indicators using sparsification
  or ALS+reconstruction.  These indicators can be historical items or
 static
  items such as geo information.  These indicators can be combined in a
  single step using a search engine.
 
 
 
 
 
 
  On Mon, May 6, 2013 at 2:58 PM, Dominik Hübner cont...@dhuebner.com
 wrote:
 
  The cluster was mostly intended for tackling the cold start problem for
  new users.
  I want to build a recommender based on existing components or to be
  precise a combination of them.
 
  Unfortunately, the only product meta-data I currently have is the
 product
  price. Furthermore, this is a project
  I am working on alone. As a consequence, the approaches I can examine in
  the given time are limited.
 
  Would using ALS and ranking its outcome by e.g. frequent item set
  algorithms be something worth looking into?
  Or did you mean something different?
 
  My personal goal is to build a recommender providing acceptable results
  using the data I currently have available.
  Of course, this will only serve as a basis for further improvements
 where
  necessary or if further information can be obtained.
 
 
  On May 6, 2013, at 11:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  Are you looking to build a product recommender based on your own
 design?
  Or do you want to build one based on existing methods?
 
  If you want to use existing methods, clustering has essentially no
 role.
 
  I think that composite approaches that use item meta-data and different
  kinds of behavioral cues are important to best performance.
 
 
  On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner cont...@dhuebner.com
  wrote:
 
  Well, as you already might have guessed, I am building a product
  recommender system for my thesis.
 
  I am planning to evaluate ALS (both, implicit and explicit) as well as
  item -similarity recommendation for users with at least a few known
  products. Nevertheless, the majority of users only has seen a single
 (or
  2-3) product(s). I want to recommend them the most popular items from
  clusters, their only product comes from (as a workaround for the
  cold-start
  problem). Furthermore, I expect to be able to see which kind of
  products
  users like. This might provide me some information about how well ALS
  and
  similarity recommenders fit the user's area of interest (an early
  evaluation) or at least to estimate if the chosen approach will work
 in
  some way.
 
  On May 6, 2013, at 9:09 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
  I don't even think that clustering is all that necessary.
 
  The reduced cooccurrence matrix will give you items related to each
  item.
 
  You can use something like PCA, but SVD is just as good here due to
  near
  zero mean.  You could SSVD or ALS from Mahout to do this analysis and
  then
  use k-means on the right singular vectors (aka item representation).
 
  What is the high level goal that you are trying to solve with this
  clustering?
 
 
 
 
  On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner 
 cont...@dhuebner.com
  wrote:
 
  And running the clustering on the cooccurrence matrix or doing PCA
 by
  removing eigenvalues/vectors?
 
  On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
  On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner 
  cont...@dhuebner.com
  wrote:
 
  Oh, and I forgot how the views and sales are used to build product
  vectors. As of now, I implemented binary vectors, vectors counting
  the
  number of views and sales (e.g 1view=1count, 1sale=10counts) and
  ordinary
  vectors ( view = 1, sale=5).
 
 
  I would recommend just putting the view and sale in different
 columns
  and
  doing cooccurrence analysis on this.