I normally deal with this by purposefully limiting the length of these rows. The argument is that if I never recommend more than 100 items to a person (or 20 or 1000 ... the argument doesn't change), then none of the item -> item* mappings needs to have more than 100 items since the tail of the list can't affect the top 100 recommendations anyway. It is also useful to limit the user history to either only recent or only important ratings. That means that a typical big multi-get is something like 100 history items x 100 related items = 10,000 items x 10 bytes for id+score. This sounds kind of big, but the average case is 5x smaller.
On Mon, May 31, 2010 at 4:01 PM, Sean Owen <[email protected]> wrote: > I'd be a little concerned about whether this fits comfortably in > memory. The similarity matrix is potentially dense -- big rows -- and > you're loading one row per item the user has rated. It could get into > tens of megabytes for one query. The distributed version dares not do > this. But, worth a try in principle. >
