Ahh... only effective in RecommenderJob.
On Tue, Jun 18, 2013 at 10:40 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > My recollection as well. > > I will read the code again. Didn't see where that happens. > > > On Tue, Jun 18, 2013 at 10:34 PM, Sean Owen <sro...@gmail.com> wrote: > >> This is the "maxPrefsPerUser" option IIRC. >> >> On Tue, Jun 18, 2013 at 9:27 PM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> > I was reading the RowSimilarityJob and it doesn't appear that it does >> > down-sampling on the original data to minimize the performance impact of >> > perversely prolific users. >> > >> > The issue is that if a single user has 100,000 items in their history, >> we >> > learn nothing more than if we picked 300 of those while the former would >> > result in processing 10 billion cooccurrences and the latter would >> result >> > in 100,000. This factor of 10,000 is so large that it can make a big >> > difference in performance. >> > >> > I had thought that the code had this down-sampling in place. >> > >> > If not, I can add row based down-sampling quite easily. >> > >