Re: Does RowSimilarity job support down-sampling

Ted Dunning Tue, 18 Jun 2013 13:44:37 -0700

Ahh... only effective in RecommenderJob.




On Tue, Jun 18, 2013 at 10:40 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> My recollection as well.
>
> I will read the code again.  Didn't see where that happens.
>
>
> On Tue, Jun 18, 2013 at 10:34 PM, Sean Owen <sro...@gmail.com> wrote:
>
>> This is the "maxPrefsPerUser" option IIRC.
>>
>> On Tue, Jun 18, 2013 at 9:27 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>> > I was reading the RowSimilarityJob and it doesn't appear that it does
>> > down-sampling on the original data to minimize the performance impact of
>> > perversely prolific users.
>> >
>> > The issue is that if a single user has 100,000 items in their history,
>> we
>> > learn nothing more than if we picked 300 of those while the former would
>> > result in processing 10 billion cooccurrences and the latter would
>> result
>> > in 100,000.  This factor of 10,000 is so large that it can make a big
>> > difference in performance.
>> >
>> > I had thought that the code had this down-sampling in place.
>> >
>> > If not, I can add row based down-sampling quite easily.
>>
>
>

Re: Does RowSimilarity job support down-sampling

Reply via email to