Re: Negative Preferences in a Recommender

2013-06-18 Thread Sean Owen
Yes the model has no room for literally negative input. I think that
conceptually people do want negative input, and in this model,
negative numbers really are the natural thing to express that.

You could give negative input a small positive weight. Or extend the
definition of c so that it is merely small, not negative, when r is
negative. But this was generally unsatisfactory. It has a logic, that
even negative input is really a slightly positive association in the
scheme of things, but the results were viewed as unintuitive.

I ended up extending it to handle negative input more directly, such
that negative input is read as evidence that p=0, instead of evidence
that p=1. This works fine, and tidier than an ensemble (although
that's a sound idea too). The change is quite small.

Agree with the second point that learning weights is manual and
difficult; that's unavoidable I think when you want to start adding
different data types anyway.

I also don't use M/R for searching parameter space since you may try a
thousand combinations and each is a model build from scratch. I use a
sample of data and run in-core.

On Tue, Jun 18, 2013 at 2:30 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
 (Kinda doing something very close. )

 Koren-Volynsky paper on implicit feedback can be generalized to decompose
 all input into preference (0 or 1) and confidence matrices (which is
 essentually an observation weight matrix).

 If you did not get any observations, you encode it as (p=0,c=1) but if you
 know that user did not like item, you can encode that observation with much
 more confidence weight, something like (p=0, c=30) -- actually as high
 confidence as a conversion in your case it seems.

 The problem with this is that you end up with quite a bunch of additional
 parameters in your model to figure, i.e. confidence weights for each type
 of action in the system. You can establish that thru extensive
 crossvalidation search, which is initially quite expensive (even for
 distributed machine cluster tech), but could be incrementally bail out much
 sooner after previous good guess is already known.

 MR doesn't work well for this though since it requires  A LOT of iterations.



 On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel pat.fer...@gmail.com wrote:

 In the case where you know a user did not like an item, how should the
 information be treated in a recommender? Normally for retail
 recommendations you have an implicit 1 for a purchase and no value
 otherwise. But what if you knew the user did not like an item? Maybe you
 have records of I want my money back for this junk reactions.

 You could make a scale, 0, 1 where 0 means a bad rating and 1 a good, no
 value as usual means no preference? Some of the math here won't work though
 since usually no value implicitly = 0 so maybe -1 = bad, 1 = good, no
 preference implicitly = 0?

 Would it be better to treat the bad rating as a 1 and good as 2? This
 would be more like the old star rating method only we would know where the
 cutoff should be between a good review and bad (1.5)

 I suppose this could also be treated as another recommender in an ensemble
 where r = r_p - r_h, where r_h = predictions from I hate this product
 preferences?

 Has anyone found a good method?


K Mean Clustering on Two columns`

2013-06-18 Thread syed kather
Hi Team
   How to do the K Mean Clustering on 2 selected Columns



Line No,age,income,sex,city
1,22,1500,1,xxx,
2,54,13450,2,yyy
-
-
-
-
-

Like this Input Goes . But i need to do Clustering on Columns 2 and 3


How to do that ?

I had tried using synthatic kmean Means But i am not able to extract the
cluster ID with corresponding to Line No.

Please help me


Thanks and regards
Syed Abdul Kather



Thanks and Regards,
S SYED ABDUL KATHER


RE: K Mean Clustering on Two columns`

2013-06-18 Thread Chandra Mohan, Ananda Vel Murugan
Hi, 

I implemented something similar in the following way. 

Created a class which implements 
org.apache.commons.math3.ml.clustering.Clusterable with just two member 
variables double[] point and long id and geter/setter function. 

Iterated through the data and created instances of this class. Added them to a 
list

Then instantiated KMeansPlusPlusClusterer as below

org.apache.commons.math3.ml.clustering.KMeansPlusPlusClustererCustomPoint 
clusterer = new KMeansPlusPlusClustererCustomPoint(4,100,new 
org.apache.commons.math3.ml.distance.CanberraDistance());

Then called KMeansPlusPlusClusterer.clusterer as follows

ListCentroidClusterCustomPoint clusterList = clusterer.cluster(points);

I was able to get the clusters in this way. Don't know whether this is the 
right approach. But it worked for me. 

Regards,
Anand.C

-Original Message-
From: syed kather [mailto:in.ab...@gmail.com] 
Sent: Tuesday, June 18, 2013 3:23 PM
To: user@mahout.apache.org
Subject: K Mean Clustering on Two columns`

Hi Team
   How to do the K Mean Clustering on 2 selected Columns



Line No,age,income,sex,city
1,22,1500,1,xxx,
2,54,13450,2,yyy
-
-
-
-
-

Like this Input Goes . But i need to do Clustering on Columns 2 and 3


How to do that ?

I had tried using synthatic kmean Means But i am not able to extract the
cluster ID with corresponding to Line No.

Please help me


Thanks and regards
Syed Abdul Kather



Thanks and Regards,
S SYED ABDUL KATHER


Re: Negative Preferences in a Recommender

2013-06-18 Thread Ted Dunning
I have found that in practice, don't-like is very close to like.  That is,
things that somebody doesn't like are very closely related to the things
that they do like.  Things that are quite distant wind up as don't-care,
not don't-like.

This makes most simple approaches to modeling polar preferences very
dangerous.  What I have usually done under the pressure of time is to
consider like and don't-like to be equivalent synonyms and then maintain a
kill list of items to not show.  Works well pragmatically, but gives people
hives when they hear of the details, especially if they actually believe
humans act according to consistent philosophy.


On Tue, Jun 18, 2013 at 9:13 AM, Sean Owen sro...@gmail.com wrote:

 Yes the model has no room for literally negative input. I think that
 conceptually people do want negative input, and in this model,
 negative numbers really are the natural thing to express that.

 You could give negative input a small positive weight. Or extend the
 definition of c so that it is merely small, not negative, when r is
 negative. But this was generally unsatisfactory. It has a logic, that
 even negative input is really a slightly positive association in the
 scheme of things, but the results were viewed as unintuitive.

 I ended up extending it to handle negative input more directly, such
 that negative input is read as evidence that p=0, instead of evidence
 that p=1. This works fine, and tidier than an ensemble (although
 that's a sound idea too). The change is quite small.

 Agree with the second point that learning weights is manual and
 difficult; that's unavoidable I think when you want to start adding
 different data types anyway.

 I also don't use M/R for searching parameter space since you may try a
 thousand combinations and each is a model build from scratch. I use a
 sample of data and run in-core.

 On Tue, Jun 18, 2013 at 2:30 AM, Dmitriy Lyubimov dlie...@gmail.com
 wrote:
  (Kinda doing something very close. )
 
  Koren-Volynsky paper on implicit feedback can be generalized to decompose
  all input into preference (0 or 1) and confidence matrices (which is
  essentually an observation weight matrix).
 
  If you did not get any observations, you encode it as (p=0,c=1) but if
 you
  know that user did not like item, you can encode that observation with
 much
  more confidence weight, something like (p=0, c=30) -- actually as high
  confidence as a conversion in your case it seems.
 
  The problem with this is that you end up with quite a bunch of additional
  parameters in your model to figure, i.e. confidence weights for each type
  of action in the system. You can establish that thru extensive
  crossvalidation search, which is initially quite expensive (even for
  distributed machine cluster tech), but could be incrementally bail out
 much
  sooner after previous good guess is already known.
 
  MR doesn't work well for this though since it requires  A LOT of
 iterations.
 
 
 
  On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel pat.fer...@gmail.com
 wrote:
 
  In the case where you know a user did not like an item, how should the
  information be treated in a recommender? Normally for retail
  recommendations you have an implicit 1 for a purchase and no value
  otherwise. But what if you knew the user did not like an item? Maybe you
  have records of I want my money back for this junk reactions.
 
  You could make a scale, 0, 1 where 0 means a bad rating and 1 a good, no
  value as usual means no preference? Some of the math here won't work
 though
  since usually no value implicitly = 0 so maybe -1 = bad, 1 = good, no
  preference implicitly = 0?
 
  Would it be better to treat the bad rating as a 1 and good as 2? This
  would be more like the old star rating method only we would know where
 the
  cutoff should be between a good review and bad (1.5)
 
  I suppose this could also be treated as another recommender in an
 ensemble
  where r = r_p - r_h, where r_h = predictions from I hate this product
  preferences?
 
  Has anyone found a good method?



RE: Feature vector generation from Bag-of-Words

2013-06-18 Thread Chandra Mohan, Ananda Vel Murugan
Hi, 

Thanks. It did help. 

Regards,
Anand.C

-Original Message-
From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] 
Sent: Tuesday, June 18, 2013 10:55 AM
To: Chandra Mohan, Ananda Vel Murugan; user@mahout.apache.org
Subject: Re: Feature vector generation from Bag-of-Words


Check this link -  
http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program







 From: Chandra Mohan, Ananda Vel Murugan ananda.muru...@honeywell.com
To: user@mahout.apache.org user@mahout.apache.org; Suneel Marthi 
suneel_mar...@yahoo.com 
Sent: Tuesday, June 18, 2013 12:59 AM
Subject: RE: Feature vector generation from Bag-of-Words
 

Hi, 

I am implementing slightly different variation of this solution. I need some 
guidance. 

I have a CSV file with two columns, REMARKS and CATEGORY. Based on the remarks, 
I train naïve bayes model which would automatically assign categories to 
REMARKS. I followed this link 
http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
 It works fine. 


Now I have a slightly different requirement. I want the text in REMARKS column 
to be tokenized in a different fashion. I have some keywords. When those 
keywords occur in REMARKS text, I want them to be intact and splitted further. 
For example, if REMARKS text is Sump pressure is low, with default analyzer, 
it would be split into four tokens as Sump, pressure, is, low. But I 
want it to be tokenized as Sump pressure, is, low. I have implemented a 
custom tokenizer which would do this. 

Now I want to vectorize this. I tried the pseudo code suggested below. I don't 
know how to serialize these vectors into sequence files. When I run seq2sparse, 
apart from vectors, it creates some other labelindex and dictionary files. I 
could not see the code to create those files here. Am I missing something? I 
have started looking into org.apache.mahout.vectorizer package. Any pointers 
would be of great help. 


Regards,
Anand.C

-Original Message-
From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] 
Sent: Tuesday, May 21, 2013 10:21 PM
To: user@mahout.apache.org
Subject: Re: Feature vector generation from Bag-of-Words

It should be easy to convert the below pseudocode to MapReduce to scale for 
large collection of documents.




From: Suneel Marthi suneel_mar...@yahoo.com
To: user@mahout.apache.org user@mahout.apache.org 
Sent: Tuesday, May 21, 2013 12:20 PM
Subject: Re: Feature vector generation from Bag-of-Words


Stuti,

Here's how I would do it.

1.  Create a collection of the 100 keywords that r of interest.

 CollectionString keywords = new ArrayListString();
 keywords.addAll(your 100 keywords);
 

2.  For each word in each of the text documents create a Multiset (which is a 
bag of words) ,
  retain only those terms of interest from (1) that are of interest and use 
Mahout's StaticWordValu

 // Itertate through all the documents
 for document in documents {

  //create a bag of words for each document
   MultisetString multiset = new HashMultisetString();

 // create a RandomAccessSparseVector
 Vector v = new RandomAccessSparseVector(100); // 100 features for the 100 
keywords 

        for term in document.terms {
        multiset.add(term);
    }

    // retain only those keywords that are of interest (from step 1)
    multiset.retainAll(keywords);

   // You now have a bag of words containing only the keywords with their 
term frequencies
      
      // Use one of the Feature Encoders, refer to Section 14.3 of Mahout in 
Action for more detailed description of
  // this process

   FeatureVectorEncoder encoder = new StaticWordValueEncoder(body);
  
 for (Multiset.EntryString entry : multiset.entrySet()) {
   encoder.addToVector(entry.getElement(), entry.getCount(), v);
 }



     


 





From: Stuti Awasthi stutiawas...@hcl.com
To: user@mahout.apache.org user@mahout.apache.org 
Sent: Tuesday, May 21, 2013 7:17 AM
Subject: Feature vector generation from Bag-of-Words


Hi all,

I have a query regarding the Feature Vector generation for Text documents.
I have read Mahout in Action and understood how to create the text document in 
feature vector weighed by Tf of Tfidf schemes. My usecase is a little tweaked 
with that.

I have few keywords may be say 100 and I want to create the Feature Vector of 
the text documents only with these 100 keywords. So I would like to calculate 
the frequency of each keyword in each document and generate the feature vector 
of the keyword with the frequency as weights.

Is there any already present way to do this or Il need to write the custom code?

Thanks
Stuti Awasthi


::DISCLAIMER::

Re: Negative Preferences in a Recommender

2013-06-18 Thread Pat Ferrel
To your point Ted, I was surprised to find that remove-from-cart actions 
predicted sales almost as well as purchases did but it also meant filtering 
from recs. We got the best scores treating them as purchases and not 
recommending them again. No one pried enough to get get bothered.

In this particular case I'm ingesting movie reviews, thumbs up or down. I'm 
trying to prime the pump for a cold start case of a media guide app with expert 
reviews but no users yet. Expert reviewers review everything so I don't think 
there will be much goodness in treating a thumbs down like a thumbs up in this 
particular case. Sean, are you suggesting that negative reviews might be 
modeled as a 0 rather than no value? Using the Mahout recommender this will 
only show up in filtering the negatives out of recs as Ted suggests, right? 
Since a 0 preference would mean, don't recommend, just as a preference of 1 
would. This seems like a good approach but I may have missed something in your 
suggestion.

In this case I'm not concerned with recommending to experts, I'm trying to make 
good recs to new users with few thumbs up or down by comparing them to experts 
with lots of thumbs up and down.The similarity metric will have new users with 
only a few preferences and will compare them to reviewers with many many more. 
I wonder if this implies a similarity metric that uses only common values 
(cooccurrence) rather than the usual log-likelihood? I guess it's easy to try 
both.

Papers I've read on this subject. The first has an interesting discussion of 
using experts in CF.
http://www.slideshare.net/xamat/the-science-and-the-magic-of-user-feedback-for-recommender-systems
http://www.sis.pitt.edu/~hlee/paper/umap2009_LeeBrusilovsky.pdf


On Jun 18, 2013, at 3:48 AM, Ted Dunning ted.dunn...@gmail.com wrote:

I have found that in practice, don't-like is very close to like.  That is,
things that somebody doesn't like are very closely related to the things
that they do like.  Things that are quite distant wind up as don't-care,
not don't-like.

This makes most simple approaches to modeling polar preferences very
dangerous.  What I have usually done under the pressure of time is to
consider like and don't-like to be equivalent synonyms and then maintain a
kill list of items to not show.  Works well pragmatically, but gives people
hives when they hear of the details, especially if they actually believe
humans act according to consistent philosophy.


On Tue, Jun 18, 2013 at 9:13 AM, Sean Owen sro...@gmail.com wrote:

 Yes the model has no room for literally negative input. I think that
 conceptually people do want negative input, and in this model,
 negative numbers really are the natural thing to express that.
 
 You could give negative input a small positive weight. Or extend the
 definition of c so that it is merely small, not negative, when r is
 negative. But this was generally unsatisfactory. It has a logic, that
 even negative input is really a slightly positive association in the
 scheme of things, but the results were viewed as unintuitive.
 
 I ended up extending it to handle negative input more directly, such
 that negative input is read as evidence that p=0, instead of evidence
 that p=1. This works fine, and tidier than an ensemble (although
 that's a sound idea too). The change is quite small.
 
 Agree with the second point that learning weights is manual and
 difficult; that's unavoidable I think when you want to start adding
 different data types anyway.
 
 I also don't use M/R for searching parameter space since you may try a
 thousand combinations and each is a model build from scratch. I use a
 sample of data and run in-core.
 
 On Tue, Jun 18, 2013 at 2:30 AM, Dmitriy Lyubimov dlie...@gmail.com
 wrote:
 (Kinda doing something very close. )
 
 Koren-Volynsky paper on implicit feedback can be generalized to decompose
 all input into preference (0 or 1) and confidence matrices (which is
 essentually an observation weight matrix).
 
 If you did not get any observations, you encode it as (p=0,c=1) but if
 you
 know that user did not like item, you can encode that observation with
 much
 more confidence weight, something like (p=0, c=30) -- actually as high
 confidence as a conversion in your case it seems.
 
 The problem with this is that you end up with quite a bunch of additional
 parameters in your model to figure, i.e. confidence weights for each type
 of action in the system. You can establish that thru extensive
 crossvalidation search, which is initially quite expensive (even for
 distributed machine cluster tech), but could be incrementally bail out
 much
 sooner after previous good guess is already known.
 
 MR doesn't work well for this though since it requires  A LOT of
 iterations.
 
 
 
 On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel pat.fer...@gmail.com
 wrote:
 
 In the case where you know a user did not like an item, how should the
 information be treated in a recommender? Normally for retail
 

Re: Negative Preferences in a Recommender

2013-06-18 Thread Pat Ferrel
They are on a lot of papers, which are you looking at?

On Jun 17, 2013, at 6:30 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:

(Kinda doing something very close. )

Koren-Volynsky paper on implicit feedback can be generalized to decompose
all input into preference (0 or 1) and confidence matrices (which is
essentually an observation weight matrix).

If you did not get any observations, you encode it as (p=0,c=1) but if you
know that user did not like item, you can encode that observation with much
more confidence weight, something like (p=0, c=30) -- actually as high
confidence as a conversion in your case it seems.

The problem with this is that you end up with quite a bunch of additional
parameters in your model to figure, i.e. confidence weights for each type
of action in the system. You can establish that thru extensive
crossvalidation search, which is initially quite expensive (even for
distributed machine cluster tech), but could be incrementally bail out much
sooner after previous good guess is already known.

MR doesn't work well for this though since it requires  A LOT of iterations.



On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel pat.fer...@gmail.com wrote:

 In the case where you know a user did not like an item, how should the
 information be treated in a recommender? Normally for retail
 recommendations you have an implicit 1 for a purchase and no value
 otherwise. But what if you knew the user did not like an item? Maybe you
 have records of I want my money back for this junk reactions.
 
 You could make a scale, 0, 1 where 0 means a bad rating and 1 a good, no
 value as usual means no preference? Some of the math here won't work though
 since usually no value implicitly = 0 so maybe -1 = bad, 1 = good, no
 preference implicitly = 0?
 
 Would it be better to treat the bad rating as a 1 and good as 2? This
 would be more like the old star rating method only we would know where the
 cutoff should be between a good review and bad (1.5)
 
 I suppose this could also be treated as another recommender in an ensemble
 where r = r_p - r_h, where r_h = predictions from I hate this product
 preferences?
 
 Has anyone found a good method?



Re: Negative Preferences in a Recommender

2013-06-18 Thread Sean Owen
I'm suggesting using numbers like -1 for thumbs-down ratings, and then
using these as a positive weight towards 0, just like positive values
are used as positive weighting towards 1.

Most people don't make many negative ratings. For them, what you do
with these doesn't make a lot of difference. It might for the few
expert users, and they might be the ones that care. For me it was
exactly this... user acceptance testers were pointing out that
thumbs-down ratings didn't seem to have the desired effect, because
they saw the result straight away.

Here's an alternative structure that doesn't involve thumbs-down:
choose 4 items, and sample in a way to prefer items that are distant
in feature space. Ask the user to pick 1 that is most interesting.
Repeat a few times.

On Tue, Jun 18, 2013 at 3:55 PM, Pat Ferrel p...@occamsmachete.com wrote:
 To your point Ted, I was surprised to find that remove-from-cart actions 
 predicted sales almost as well as purchases did but it also meant filtering 
 from recs. We got the best scores treating them as purchases and not 
 recommending them again. No one pried enough to get get bothered.

 In this particular case I'm ingesting movie reviews, thumbs up or down. I'm 
 trying to prime the pump for a cold start case of a media guide app with 
 expert reviews but no users yet. Expert reviewers review everything so I 
 don't think there will be much goodness in treating a thumbs down like a 
 thumbs up in this particular case. Sean, are you suggesting that negative 
 reviews might be modeled as a 0 rather than no value? Using the Mahout 
 recommender this will only show up in filtering the negatives out of recs as 
 Ted suggests, right? Since a 0 preference would mean, don't recommend, just 
 as a preference of 1 would. This seems like a good approach but I may have 
 missed something in your suggestion.

 In this case I'm not concerned with recommending to experts, I'm trying to 
 make good recs to new users with few thumbs up or down by comparing them to 
 experts with lots of thumbs up and down.The similarity metric will have new 
 users with only a few preferences and will compare them to reviewers with 
 many many more. I wonder if this implies a similarity metric that uses only 
 common values (cooccurrence) rather than the usual log-likelihood? I guess 
 it's easy to try both.

 Papers I've read on this subject. The first has an interesting discussion of 
 using experts in CF.
 http://www.slideshare.net/xamat/the-science-and-the-magic-of-user-feedback-for-recommender-systems
 http://www.sis.pitt.edu/~hlee/paper/umap2009_LeeBrusilovsky.pdf


Re: Negative Preferences in a Recommender

2013-06-18 Thread Dmitriy Lyubimov
Koren, Volinsky: CF for implicit feedback datasets


On Tue, Jun 18, 2013 at 8:07 AM, Pat Ferrel p...@occamsmachete.com wrote:

 They are on a lot of papers, which are you looking at?

 On Jun 17, 2013, at 6:30 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:

 (Kinda doing something very close. )

 Koren-Volynsky paper on implicit feedback can be generalized to decompose
 all input into preference (0 or 1) and confidence matrices (which is
 essentually an observation weight matrix).

 If you did not get any observations, you encode it as (p=0,c=1) but if you
 know that user did not like item, you can encode that observation with much
 more confidence weight, something like (p=0, c=30) -- actually as high
 confidence as a conversion in your case it seems.

 The problem with this is that you end up with quite a bunch of additional
 parameters in your model to figure, i.e. confidence weights for each type
 of action in the system. You can establish that thru extensive
 crossvalidation search, which is initially quite expensive (even for
 distributed machine cluster tech), but could be incrementally bail out much
 sooner after previous good guess is already known.

 MR doesn't work well for this though since it requires  A LOT of
 iterations.



 On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel pat.fer...@gmail.com wrote:

  In the case where you know a user did not like an item, how should the
  information be treated in a recommender? Normally for retail
  recommendations you have an implicit 1 for a purchase and no value
  otherwise. But what if you knew the user did not like an item? Maybe you
  have records of I want my money back for this junk reactions.
 
  You could make a scale, 0, 1 where 0 means a bad rating and 1 a good, no
  value as usual means no preference? Some of the math here won't work
 though
  since usually no value implicitly = 0 so maybe -1 = bad, 1 = good, no
  preference implicitly = 0?
 
  Would it be better to treat the bad rating as a 1 and good as 2? This
  would be more like the old star rating method only we would know where
 the
  cutoff should be between a good review and bad (1.5)
 
  I suppose this could also be treated as another recommender in an
 ensemble
  where r = r_p - r_h, where r_h = predictions from I hate this product
  preferences?
 
  Has anyone found a good method?




CFP: ACM RecSys 2013 Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys 2013)

2013-06-18 Thread Alejandro Bellogin Kouki

Dear colleagues,

we are pleased to announce RepSys, a workshop on Reproducibility and 
Replication that will be held in ACM RecSys 2013. This workshop aims to 
provide an opportunity to discuss about the limitations and challenges 
of experimental reproducibility and replication.


Hope you find it interesting.

Regards,
Alejandro

[Apologies if you receive this more than once]


===

ACM RecSys Workshop on

Reproducibility and Replication in Recommender Systems Evaluation - RepSys2013

 7th ACM Recommender Systems Conference (RecSys 2013)

Hong Kong, China, 12 or 16 October 2013

  http://repsys.project.cwi.nl

===


*  Submission deadline: 22 July 2013 *


== Scope ==


Experiment replication and reproduction are key requirements for empirical 

research methodology, and an important open issue in the field of Recommender 

Systems. When an experiment is repeated by a different researcher and exactly 

the same result is obtained, we can say the experiment has been replicated. When 

the results are not exactly the same but the conclusions are compatible with the 

prior ones, we have a reproduction of the experiment. Reproducibility and 

replication involve recommendation algorithm implementations, experimental 

protocols, and evaluation metrics. While the problem of reproducibility and 

replication has been recognized in the Recommender Systems community, the need 

for a clear solution remains largely unmet, which motivates the present 


workshop.


== Topics ==


We invite the submission of papers reporting original research, studies, 

advances, experiences, or work in progress in the scope of reproducibility and 

replication in Recommender Systems evaluation. Papers explicitly dealing with 

replication of previously published experimental conditions/algorithms/metrics 

and the resulting analysis are encouraged. In particular, we seek discussions on 

the difficulties the authors may find in this process, along with their 


limitations or successes on reproducing the original results.


The topics the workshop seeks to address include –though need not be limited to– 


the following:

* Limitations and challenges of experimental reproducibility and replication

* Reproducible experimental design

* Replicability of algorithms

* Standardization of metrics: definition and computation protocols

* Evaluation software: frameworks, utilities, services

* Reproducibility in user-centric studies

* Datasets and benchmarks

* Recommender software reuse

* Replication of already published work

* Reproducibility within and across domains and organizations

* Reproducibility and replication guidelines


== Submission == 



Two submission types are accepted: long papers of up to 8 pages, and short 

papers up to 4 pages. The papers will be evaluated for their originality, 

contribution significance, soundness, clarity, and overall quality. The interest 

of contributions will be assessed in terms of technical and scientific findings, 

contribution to the knowledge and understanding of the problem, methodological 

advancements, or applicative value. Specific contributions focusing on 

repeatability and reproducibility in terms of algorithm implementations, 


evaluation frameworks and/or practice will also be welcome and valued.


All submissions shall adhere to the standard ACM SIG proceedings format: 


http://www.acm.org/sigs/publications/proceedings-templates.


Submissions shall be sent as a pdf file through the online submission system now 


open at: https://www.easychair.org/conferences/?conf=repsys2013.


== Important dates ==


* Paper submission deadline: 22 July

* Notification: 16 August

* Camera-ready version due: 30 August


== Organizers ==


* Alejandro Bellogín, Centrum Wiskunde  Informatica, The Netherlands

* Pablo Castells, Universidad Autónoma de Madrid, Spain

* Alan Said, Centrum Wiskunde  Informatica, The Netherlands

* Domonkos Tikk, Gravity RD, Hungary


== Programme Committee ==


* Xavier Amatriain, Netflix, USA

* Linas Baltrunas, Telefonica Research, Spain

* Marcel Blattner, University of Applied Sciences, Switzerland

* Iván Cantador, Universidad Autónoma de Madrid, Spain

* Ed Chi, Google, USA

* Arjen de Vries, Centrum Wiskunde  Informatica, Netherlands

* Juan Manuel Fernández, Universidad de Granada, Spain

* Zeno Gantner, Nokia, Germany

* Pankaj Gupta, Twitter, USA

* Andreas Hotho, University of Würburg, Germany

* Juan Huete, Universidad de Granada, Spain

* Kris Jack, Mendeley, England

* Dietmar Jannach, University of Dortmund, Germany

* Jaap Kamps, University of Amsterdam, Netherlands

* Alexandros Karatzoglou, TID, Spain

* Bart Knijnenburg, University of California, Irvine, USA

* Ido Guy, Google, Israel

* Jérôme 

Re: Negative Preferences in a Recommender

2013-06-18 Thread Dmitriy Lyubimov
On Tue, Jun 18, 2013 at 3:48 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 I have found that in practice, don't-like is very close to like.  That is,
 things that somebody doesn't like are very closely related to the things
 that they do like.


I guess it makes sense for cancellations. i guess it should become pretty
obvious from extensive crossvalidation search.


  Things that are quite distant wind up as don't-care,
 not don't-like.

 This makes most simple approaches to modeling polar preferences very
 dangerous.  What I have usually done under the pressure of time is to
 consider like and don't-like to be equivalent synonyms and then maintain a
 kill list of items to not show.  Works well pragmatically, but gives people
 hives when they hear of the details, especially if they actually believe
 humans act according to consistent philosophy.


Or we just don't know exact prevailing reason for the returns. :) did not
fit is almost fit, give me similar, and  found on another sale event
means I still like it, just not your price.
However, if there's a consistent quality issue, it may turn bad enough to
consider p=0. bottom line, it should become fairly obvious which reasoning
prevails, thru validation.

Kill list should probably be maintained for a whole lot of reasons, not
just returns. E.g. something that was recently bought, may be
one-a-lifetime purchase, or it may be replenishable with a certain period
of repeatability (which could also be modelled). Does it makes sense?



 On Tue, Jun 18, 2013 at 9:13 AM, Sean Owen sro...@gmail.com wrote:

  Yes the model has no room for literally negative input. I think that
  conceptually people do want negative input, and in this model,
  negative numbers really are the natural thing to express that.
 
  You could give negative input a small positive weight. Or extend the
  definition of c so that it is merely small, not negative, when r is
  negative. But this was generally unsatisfactory. It has a logic, that
  even negative input is really a slightly positive association in the
  scheme of things, but the results were viewed as unintuitive.
 
  I ended up extending it to handle negative input more directly, such
  that negative input is read as evidence that p=0, instead of evidence
  that p=1. This works fine, and tidier than an ensemble (although
  that's a sound idea too). The change is quite small.
 
  Agree with the second point that learning weights is manual and
  difficult; that's unavoidable I think when you want to start adding
  different data types anyway.
 
  I also don't use M/R for searching parameter space since you may try a
  thousand combinations and each is a model build from scratch. I use a
  sample of data and run in-core.
 
  On Tue, Jun 18, 2013 at 2:30 AM, Dmitriy Lyubimov dlie...@gmail.com
  wrote:
   (Kinda doing something very close. )
  
   Koren-Volynsky paper on implicit feedback can be generalized to
 decompose
   all input into preference (0 or 1) and confidence matrices (which is
   essentually an observation weight matrix).
  
   If you did not get any observations, you encode it as (p=0,c=1) but if
  you
   know that user did not like item, you can encode that observation with
  much
   more confidence weight, something like (p=0, c=30) -- actually as high
   confidence as a conversion in your case it seems.
  
   The problem with this is that you end up with quite a bunch of
 additional
   parameters in your model to figure, i.e. confidence weights for each
 type
   of action in the system. You can establish that thru extensive
   crossvalidation search, which is initially quite expensive (even for
   distributed machine cluster tech), but could be incrementally bail out
  much
   sooner after previous good guess is already known.
  
   MR doesn't work well for this though since it requires  A LOT of
  iterations.
  
  
  
   On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel pat.fer...@gmail.com
  wrote:
  
   In the case where you know a user did not like an item, how should the
   information be treated in a recommender? Normally for retail
   recommendations you have an implicit 1 for a purchase and no value
   otherwise. But what if you knew the user did not like an item? Maybe
 you
   have records of I want my money back for this junk reactions.
  
   You could make a scale, 0, 1 where 0 means a bad rating and 1 a good,
 no
   value as usual means no preference? Some of the math here won't work
  though
   since usually no value implicitly = 0 so maybe -1 = bad, 1 = good, no
   preference implicitly = 0?
  
   Would it be better to treat the bad rating as a 1 and good as 2? This
   would be more like the old star rating method only we would know where
  the
   cutoff should be between a good review and bad (1.5)
  
   I suppose this could also be treated as another recommender in an
  ensemble
   where r = r_p - r_h, where r_h = predictions from I hate this
 product
   preferences?
  
   Has anyone found a good method?