Hi,
I've noticed a problem in the non-Hadoop (taste) version of the
recommender package. The problem is in the AbstractSimilarity (in
package org.apache.mahout.cf.taste.impl.similarity).
This class is the base class for computing the similarity values
between vectors of users or items. It
AbstractIDMigrator is for being able to use String IDs (it converts
Strings to Longs.)
IDs are stored in Long types, so there should not be any problems with
negative IDs, but in practice I have not work with negative IDs
before.
Tevfik
On Wed, Aug 6, 2014 at 3:51 AM, Peng Zhang
- Is there a way to specify the train and test set like you can with the
*RecommenderEvaluator*?
No, though you can specify the evaluation percentage. This is because
of the logic of evaluation. The logic is to take away relevant items
and then make recommendations and see whether the
Interesting topic,
Ted, can you give examples of those mathematical assumptions
under-pinning ALS which are violated by the real world?
On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning ted.dunn...@gmail.com wrote:
How can there be any other practical method? Essentially all of the
mathematical
5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote:
Hi Juan,
If I remember correctly, AllSimilarItemsCandidateItemsStrategy
returns all items that have not been rated by the user and the
similarity metric returns a non-NaN similarity value that is with at
least one of the items
not been rated by the user, what would
AllUnknownItemsCandidateItemsStrategy return?
On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin
tevfik.ayte...@gmail.comwrote:
Sorry there was a typo in the previous paragraph.
If I remember correctly, AllSimilarItemsCandidateItemsStrategy
returns all
, but AllSimilarItemsCandidateItemsStrategy is returning that item. So,
I'm truly sorry to insist on this, but I still really do not get the
difference.
On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin
tevfik.ayte...@gmail.comwrote:
Juan,
You got me wrong,
AllSimilarItemsCandidateItemsStrategy
returns all
AllSimilarItemsStrategy already
selects the maximum set of items that could be potentially recommended to
the user.
--sebastian
On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
If the similarity between item 5 and two of the items user 1 preferred are
not
NaN then it will return 1, that is what I'm
It can even make things worse in SVD-based algorithms for which
preference estimation is very fast.
On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote:
Hi Sebastian,
But in order not to select items that is not similar to at least one
of the items the user interacted
In some cases users might not get any recommendations. There might be
different reasons of this. In your case there is only item 107 which
can be recommended to user 5 (since user 5 rated all other items).
Item 107 got two ratings which are both 5. In this case pearson
correlation between this
You are right Koobas, my answer was on the assumption that item-based
NN is used (but I noticed that user-based NN is being used). So my
answer is not correct, sorry.
Currently, I could not understand the exact reason why user 5 is not
getting any recommendations, as you said user 5 should get
Well, I think what you are suggesting is to define popularity as being
similar to other items. So in this way most popular items will be
those which are most similar to all other items, like the centroids in
K-means.
I would first check the correlation between this definition and the
standard one
Thanks for the answers, actually I worked on a similar issue,
increasing the diversity of top-N lists
(http://link.springer.com/article/10.1007%2Fs10844-013-0252-9).
Clustering-based approaches produce good results and they are very
fast compared to some optimization based techniques. Also it
Case 1 is fine, in case 2, I don't think that a dot product (without
normalization) will yield a meaningful distance measure. Cosine
distance or a Pearson correlation would be better. The situation is
similar to Latent Semantic Indexing in which documents are represented
by their low rank
...@gmail.com wrote:
On Sat, Jan 25, 2014 at 3:51 PM, Tevfik Aytekin
tevfik.ayte...@gmail.comwrote:
Case 1 is fine, in case 2, I don't think that a dot product (without
normalization) will yield a meaningful distance measure. Cosine
distance or a Pearson correlation would be better. The situation
Thanks Sebastian.
On Sat, Sep 7, 2013 at 8:24 PM, Sebastian Schelter
ssc.o...@googlemail.com wrote:
IIRC the algorithm behind ParallelSGDFactorizer needs shared memory,
which is not given in a shared-nothing environment.
On 07.09.2013 19:08, Tevfik Aytekin wrote:
Hi,
There seems
Hi,
There seems to be no Hadoop implementation of ParallelSGDFactorizer.
ALSWRFactorizer has a Hadoop implementation.
ParallelSGDFactorizer (since it is based on stochastic gradient
descent) is much faster than ALSWRFactorizer.
I don't know Hadoop much. But it seems to me that a Hadoop
Sebastian, what is IIRC?
On Sat, Sep 7, 2013 at 8:24 PM, Sebastian Schelter
ssc.o...@googlemail.com wrote:
IIRC the algorithm behind ParallelSGDFactorizer needs shared memory,
which is not given in a shared-nothing environment.
On 07.09.2013 19:08, Tevfik Aytekin wrote:
Hi,
There seems
Thanks Sean, but I could not get your answer. Can you please explain it again?
On Sun, May 19, 2013 at 8:00 PM, Sean Owen sro...@gmail.com wrote:
It doesn't matter, in the sense that it is never going to be fast
enough for real-time at any reasonable scale if actually run off a
database
, into memory. And in that case, it makes no
difference where the data is being read from, because it is read just
once, serially. A file is just as fine as a fancy database. In fact
it's probably easier and faster.
On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin
tevfik.ayte...@gmail.com wrote
, May 19, 2013 at 10:14 AM, Tevfik Aytekin
tevfik.ayte...@gmail.com wrote:
Thanks Sean, but I could not get your answer. Can you please explain it
again?
On Sun, May 19, 2013 at 8:00 PM, Sean Owen sro...@gmail.com wrote:
It doesn't matter, in the sense that it is never going to be fast
This problem is called one-class classification problem. In the domain
of collaborative filtering it is called one-class collaborative
filtering (since what you have are only positive preferences). You may
search the web with these key words to find papers providing
solutions. I'm not sure whether
at 8:29 PM, Tevfik Aytekin tevfik.ayte...@gmail.com
wrote:
This problem is called one-class classification problem. In the domain
of collaborative filtering it is called one-class collaborative
filtering (since what you have are only positive preferences). You may
search the web with these key
But the data under consideration here is not 0/1 data, it contains only 1's.
On Mon, May 6, 2013 at 11:29 PM, Sean Owen sro...@gmail.com wrote:
Parallel ALS is exactly an example of where you can use matrix
factorization for 0/1 data.
On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin tevfik.ayte
You are correct, since centeredSumX2 equals zero, the Pearson
similarity will be undefined (because of division by zero in the
Pearson formula).
If you do not center the data that will be cosine similarity which is
another common similarity metric used in recommender systems and it
will not be
I think, it is better to choose ratings of the test user in a random fashion.
On Sat, Feb 16, 2013 at 9:37 PM, Sean Owen sro...@gmail.com wrote:
Yes. But: the test sample is small. Using 40% of your data to test is
probably quite too much.
My point is that it may be the least-bad thing to do.
problematic.
On Sat, Feb 16, 2013 at 8:53 PM, Tevfik Aytekin
tevfik.ayte...@gmail.comwrote:
I think, it is better to choose ratings of the test user in a random
fashion.
On Sat, Feb 16, 2013 at 9:37 PM, Sean Owen sro...@gmail.com wrote:
Yes. But: the test sample is small. Using 40% of your data
idea, except you're
randomly throwing away some lower-rated data from both test and train. I
don't see what that helps either.
On Sat, Feb 16, 2013 at 9:41 PM, Tevfik Aytekin
tevfik.ayte...@gmail.comwrote:
What I mean is you can choose ratings randomly and try to recommend
the ones above
28 matches
Mail list logo