On Thu, Oct 23, 2008 at 7:49 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > OK, BooleanPreference sounds right here, except I'm still not certain I'll > really have only boolean preferences (actually, I only have "true" > preferences, I never have "boolean" -- I only have 1.0, never 0.0). For > instance, since I'm working with news, I can consider just viewing a news > item as 1.0, but I can also consider emailing it as 1.5 or 2.0. Or saving it > might be translated to a 3.0 preference, for example.
Yes "boolean" is arguably the wrong name. But sounds like you do have regular preferences anyway, on some scale, so nevermind this line of thinking for now... > Yeah, I already started looking at FileDataModel to see how I'd parse extra > data associated with each (user,item,pref) triple and where I'd store it. It > looks like processLine(...) and buildPreference(). The processList() should > really be protected, not private, ha? I can make it protected, sure. > I see DetailedPreference now, that's kind of what I was thinking....except > DetailedPreference is not used anywhere, so I can't see how that extra > timestamp could be used, plus I wonder if it can be made generic. But even > without a generic preference that can hold various meta data, it seems easy > to write a custom Preference impl. It isn't actually used. Someone asked about it a while ago so I threw in an implementation that includes time, almost as an illustration. You can obviously create an implementation that holds whatever you need. > Just dropping the 1.0 and using BooleanPreference? I'll try it, it's a > one-line change. Yeah can't hurt. I suppose this only happens to work for you now, since you don't have real preferences on a scale. If you want to go further down this path I can suggest further optimizations (like, you don't really need Preference objects at all. Just a Set of item IDs in the User implementation. And then rewrite stuff to be much faster as a result of this.) > Right, but I think I need more than an item and a score. I need that other > data (e.g. that timestamp from DetailedPreference or some other data) to > rescore, and that means I either have to have it already read and in memory > (e.g. via input fed into DetailedPreference during load time) or for each > item that Rescorer considers I have to go get the data from an external store > (e.g. a DB) at run-time, which is probably not a very scalable approach. Right, either it's already there or you read it on the fly (maybe caching it?) > So, if I now add extra data to my Taste input, I'll hit memory limits even > sooner! :( > Even with 1G heap I'm unable to read 1.2M data points, which for me > represents less than one day's worth of data.... and I really need to have at > least a few days worth of data in order to benefit from "historic overlap" of > users' item consumption in order to figure out "people like you". Buy more memory! :) Sounds like the optimizations above may actually be worthwhile to try even just for the short term then.
