problem in recommender similarity computation (taste)

2015-03-07 Thread Tevfik Aytekin
Hi, I've noticed a problem in the non-Hadoop (taste) version of the recommender package. The problem is in the AbstractSimilarity (in package org.apache.mahout.cf.taste.impl.similarity). This class is the base class for computing the similarity values between vectors of users or items. It

Re: Can user id and item id be negative integers?

2014-08-09 Thread Tevfik Aytekin
AbstractIDMigrator is for being able to use String IDs (it converts Strings to Longs.) IDs are stored in Long types, so there should not be any problems with negative IDs, but in practice I have not work with negative IDs before. Tevfik On Wed, Aug 6, 2014 at 3:51 AM, Peng Zhang

Re: Recommender Systems - RecommenderIRStatsEvaluator

2014-05-20 Thread Tevfik Aytekin
- Is there a way to specify the train and test set like you can with the *RecommenderEvaluator*? No, though you can specify the evaluation percentage. This is because of the logic of evaluation. The logic is to take away relevant items and then make recommendations and see whether the

Re: Number of features for ALS

2014-03-27 Thread Tevfik Aytekin
Interesting topic, Ted, can you give examples of those mathematical assumptions under-pinning ALS which are violated by the real world? On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning ted.dunn...@gmail.com wrote: How can there be any other practical method? Essentially all of the mathematical

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
It can even make things worse in SVD-based algorithms for which preference estimation is very fast. On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Sebastian, But in order not to select items that is not similar to at least one of the items the user interacted

Re: Why some userId has no recommendations?

2014-02-13 Thread Tevfik Aytekin
In some cases users might not get any recommendations. There might be different reasons of this. In your case there is only item 107 which can be recommended to user 5 (since user 5 rated all other items). Item 107 got two ratings which are both 5. In this case pearson correlation between this

Re: Why some userId has no recommendations?

2014-02-13 Thread Tevfik Aytekin
You are right Koobas, my answer was on the assumption that item-based NN is used (but I noticed that user-based NN is being used). So my answer is not correct, sorry. Currently, I could not understand the exact reason why user 5 is not getting any recommendations, as you said user 5 should get

Re: Popularity of recommender items

2014-02-06 Thread Tevfik Aytekin
Well, I think what you are suggesting is to define popularity as being similar to other items. So in this way most popular items will be those which are most similar to all other items, like the centroids in K-means. I would first check the correlation between this definition and the standard one

Re: generic latent variable recommender question

2014-01-26 Thread Tevfik Aytekin
Thanks for the answers, actually I worked on a similar issue, increasing the diversity of top-N lists (http://link.springer.com/article/10.1007%2Fs10844-013-0252-9). Clustering-based approaches produce good results and they are very fast compared to some optimization based techniques. Also it

Re: generic latent variable recommender question

2014-01-25 Thread Tevfik Aytekin
Case 1 is fine, in case 2, I don't think that a dot product (without normalization) will yield a meaningful distance measure. Cosine distance or a Pearson correlation would be better. The situation is similar to Latent Semantic Indexing in which documents are represented by their low rank

Re: generic latent variable recommender question

2014-01-25 Thread Tevfik Aytekin
...@gmail.com wrote: On Sat, Jan 25, 2014 at 3:51 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Case 1 is fine, in case 2, I don't think that a dot product (without normalization) will yield a meaningful distance measure. Cosine distance or a Pearson correlation would be better. The situation

Re: Hadoop implementation of ParallelSGDFactorizer

2013-09-08 Thread Tevfik Aytekin
Thanks Sebastian. On Sat, Sep 7, 2013 at 8:24 PM, Sebastian Schelter ssc.o...@googlemail.com wrote: IIRC the algorithm behind ParallelSGDFactorizer needs shared memory, which is not given in a shared-nothing environment. On 07.09.2013 19:08, Tevfik Aytekin wrote: Hi, There seems

Hadoop implementation of ParallelSGDFactorizer

2013-09-07 Thread Tevfik Aytekin
Hi, There seems to be no Hadoop implementation of ParallelSGDFactorizer. ALSWRFactorizer has a Hadoop implementation. ParallelSGDFactorizer (since it is based on stochastic gradient descent) is much faster than ALSWRFactorizer. I don't know Hadoop much. But it seems to me that a Hadoop

Re: Hadoop implementation of ParallelSGDFactorizer

2013-09-07 Thread Tevfik Aytekin
Sebastian, what is IIRC? On Sat, Sep 7, 2013 at 8:24 PM, Sebastian Schelter ssc.o...@googlemail.com wrote: IIRC the algorithm behind ParallelSGDFactorizer needs shared memory, which is not given in a shared-nothing environment. On 07.09.2013 19:08, Tevfik Aytekin wrote: Hi, There seems

Re: Which database should I use with Mahout

2013-05-19 Thread Tevfik Aytekin
Thanks Sean, but I could not get your answer. Can you please explain it again? On Sun, May 19, 2013 at 8:00 PM, Sean Owen sro...@gmail.com wrote: It doesn't matter, in the sense that it is never going to be fast enough for real-time at any reasonable scale if actually run off a database

Re: Which database should I use with Mahout

2013-05-19 Thread Tevfik Aytekin
, into memory. And in that case, it makes no difference where the data is being read from, because it is read just once, serially. A file is just as fine as a fancy database. In fact it's probably easier and faster. On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote

Re: Which database should I use with Mahout

2013-05-19 Thread Tevfik Aytekin
, May 19, 2013 at 10:14 AM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Thanks Sean, but I could not get your answer. Can you please explain it again? On Sun, May 19, 2013 at 8:00 PM, Sean Owen sro...@gmail.com wrote: It doesn't matter, in the sense that it is never going to be fast

Re: parallelALS and RMSE TEST

2013-05-06 Thread Tevfik Aytekin
This problem is called one-class classification problem. In the domain of collaborative filtering it is called one-class collaborative filtering (since what you have are only positive preferences). You may search the web with these key words to find papers providing solutions. I'm not sure whether

Re: parallelALS and RMSE TEST

2013-05-06 Thread Tevfik Aytekin
at 8:29 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: This problem is called one-class classification problem. In the domain of collaborative filtering it is called one-class collaborative filtering (since what you have are only positive preferences). You may search the web with these key

Re: parallelALS and RMSE TEST

2013-05-06 Thread Tevfik Aytekin
But the data under consideration here is not 0/1 data, it contains only 1's. On Mon, May 6, 2013 at 11:29 PM, Sean Owen sro...@gmail.com wrote: Parallel ALS is exactly an example of where you can use matrix factorization for 0/1 data. On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin tevfik.ayte

Re: User Based recommender - strange behaviour of Pearson

2013-04-09 Thread Tevfik Aytekin
You are correct, since centeredSumX2 equals zero, the Pearson similarity will be undefined (because of division by zero in the Pearson formula). If you do not center the data that will be cosine similarity which is another common similarity metric used in recommender systems and it will not be

Re: Problems with Mahout's RecommenderIRStatsEvaluator

2013-02-16 Thread Tevfik Aytekin
I think, it is better to choose ratings of the test user in a random fashion. On Sat, Feb 16, 2013 at 9:37 PM, Sean Owen sro...@gmail.com wrote: Yes. But: the test sample is small. Using 40% of your data to test is probably quite too much. My point is that it may be the least-bad thing to do.

Re: Problems with Mahout's RecommenderIRStatsEvaluator

2013-02-16 Thread Tevfik Aytekin
problematic. On Sat, Feb 16, 2013 at 8:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: I think, it is better to choose ratings of the test user in a random fashion. On Sat, Feb 16, 2013 at 9:37 PM, Sean Owen sro...@gmail.com wrote: Yes. But: the test sample is small. Using 40% of your data

Re: Problems with Mahout's RecommenderIRStatsEvaluator

2013-02-16 Thread Tevfik Aytekin
idea, except you're randomly throwing away some lower-rated data from both test and train. I don't see what that helps either. On Sat, Feb 16, 2013 at 9:41 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: What I mean is you can choose ratings randomly and try to recommend the ones above