Ah, good catch. I will adjust that.

I'm happy to make a new example for 'boolean' data, perhaps based on
BookCrossing. It would just ignore the rating data.

On Wed, Mar 10, 2010 at 2:46 PM,  <[email protected]> wrote:
> I think I found the explanation of the poor result and, maybe, the 
> instability.
>
> More than 60% of the ratings are 0/10. This is what the publishers of this
> dataset call "implicit rating". It means that the book was read (or purchased)
> but not rated by the user.
>
> It seems that BookCrossingDataModel is not aware of that and just considered
> them as rating 0. It is therefore not surprising that results are 
> inconsistant.
> An obvious way to solve the problem would be to filter out these implicit
> ratings.
>
> It would be interesting as well to change all ratings to "0" and to consider 
> all
> of them as implicit. There is so far no mahout examples dedicated to
> recommendation based on binary data (user as bought item or not), even though
> this seems to me like a more common problem than recommendation based on 
> actual
> ratings.
>
>
> Selon Sean Owen <[email protected]>:
>
>> I see the same variance, but I believe it's due to a small input size.
>> At the moment it's using only 5% of the total input, or about 50,000
>> ratings over 5,000 users. That's fairly small. From there, it's also
>> looking at only 5% of those users to form neighborhoods. These are
>> just too low, and I have increased the amount of data the evaluation
>> uses in a few ways, and get much more stable results.
>>
>> I also switched the algorithm it uses, since the average difference
>> was 4 out of 10, which is pretty poor. I think with more research one
>> could pick the optimal algorithm, but I just picked something that
>> worked a little better (< 3) for now.
>>
>> On Tue, Mar 9, 2010 at 6:30 PM, Sean Owen <[email protected]> wrote:
>> > I see, that definitely doesn't sound right. Let me run it myself
>> > tonight when I am home and see what I observe.
>> >
>> > On Tue, Mar 9, 2010 at 5:40 PM,  <[email protected]> wrote:
>> >> I did not change anything from the example provided in mahout-example,
>> >> development version. It uses 5% for evaluation, which is 5000 instances.
>> With
>> >> such test set size, the range should not be that big. I suspect that there
>> is
>> >> something wrong somewhere.
>> >
>>
>
>
>

Reply via email to