On Thu, Oct 23, 2008 at 7:49 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> OK, BooleanPreference sounds right here, except I'm still not certain I'll 
> really have only boolean preferences (actually, I only have "true" 
> preferences, I never have "boolean" -- I only have 1.0, never 0.0).  For 
> instance, since I'm working with news, I can consider just viewing a news 
> item as 1.0, but I can also consider emailing it as 1.5 or 2.0.  Or saving it 
> might be translated to a 3.0 preference, for example.

Yes "boolean" is arguably the wrong name. But sounds like you do have
regular preferences anyway, on some scale, so nevermind this line of
thinking for now...


> Yeah, I already started looking at FileDataModel to see how I'd parse extra 
> data associated with each (user,item,pref) triple and where I'd store it.  It 
> looks like processLine(...) and buildPreference().  The processList() should 
> really be protected, not private, ha?

I can make it protected, sure.


> I see DetailedPreference now, that's kind of what I was thinking....except 
> DetailedPreference is not used anywhere, so I can't see how that extra 
> timestamp could be used, plus I wonder if it can be made generic.  But even 
> without a generic preference that can hold various meta data, it seems easy 
> to write a custom Preference impl.

It isn't actually used. Someone asked about it a while ago so I threw
in an implementation that includes time, almost as an illustration.
You can obviously create an implementation that holds whatever you
need.


> Just dropping the 1.0 and using BooleanPreference?  I'll try it, it's a 
> one-line change.

Yeah can't hurt. I suppose this only happens to work for you now,
since you don't have real preferences on a scale. If you want to go
further down this path I can suggest further optimizations (like, you
don't really need Preference objects at all. Just a Set of item IDs in
the User implementation. And then rewrite stuff to be much faster as a
result of this.)


> Right, but I think I need more than an item and a score.  I need that other 
> data (e.g. that timestamp from DetailedPreference or some other data) to 
> rescore, and that means I either have to have it already read and in memory 
> (e.g. via input fed into DetailedPreference during load time) or for each 
> item that Rescorer considers I have to go get the data from an external store 
> (e.g. a DB) at run-time, which is probably not a very scalable approach.

Right, either it's already there or you read it on the fly (maybe caching it?)


> So, if I now add extra data to my Taste input, I'll hit memory limits even 
> sooner! :(
> Even with 1G heap I'm unable to read 1.2M data points, which for me 
> represents less than one day's worth of data.... and I really need to have at 
> least a few days worth of data in order to benefit from "historic overlap" of 
> users' item consumption in order to figure out "people like you".

Buy more memory! :)
Sounds like the optimizations above may actually be worthwhile to try
even just for the short term then.

Reply via email to