I think I found the explanation of the poor result and, maybe, the instability.

More than 60% of the ratings are 0/10. This is what the publishers of this
dataset call "implicit rating". It means that the book was read (or purchased)
but not rated by the user.

It seems that BookCrossingDataModel is not aware of that and just considered
them as rating 0. It is therefore not surprising that results are inconsistant.
An obvious way to solve the problem would be to filter out these implicit
ratings.

It would be interesting as well to change all ratings to "0" and to consider all
of them as implicit. There is so far no mahout examples dedicated to
recommendation based on binary data (user as bought item or not), even though
this seems to me like a more common problem than recommendation based on actual
ratings.


Selon Sean Owen <[email protected]>:

> I see the same variance, but I believe it's due to a small input size.
> At the moment it's using only 5% of the total input, or about 50,000
> ratings over 5,000 users. That's fairly small. From there, it's also
> looking at only 5% of those users to form neighborhoods. These are
> just too low, and I have increased the amount of data the evaluation
> uses in a few ways, and get much more stable results.
>
> I also switched the algorithm it uses, since the average difference
> was 4 out of 10, which is pretty poor. I think with more research one
> could pick the optimal algorithm, but I just picked something that
> worked a little better (< 3) for now.
>
> On Tue, Mar 9, 2010 at 6:30 PM, Sean Owen <[email protected]> wrote:
> > I see, that definitely doesn't sound right. Let me run it myself
> > tonight when I am home and see what I observe.
> >
> > On Tue, Mar 9, 2010 at 5:40 PM,  <[email protected]> wrote:
> >> I did not change anything from the example provided in mahout-example,
> >> development version. It uses 5% for evaluation, which is 5000 instances.
> With
> >> such test set size, the range should not be that big. I suspect that there
> is
> >> something wrong somewhere.
> >
>


Reply via email to