[
https://issues.apache.org/jira/browse/MAHOUT-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118970#comment-13118970
]
Sean Owen commented on MAHOUT-824:
----------------------------------
I agree that slope-one is surprisingly effective, though storing all item-item
diffs is expensive. The good news is that almost all item-item diffs carry
virtually no info; they're noise. So, I think there's a much better way to
dodge the memory constraint: prune the diffs. That can get you much farther
than even optimizing away object overhead.
See the constructors that let you cap the number of diffs stored. It can be
reasonably smart; it throws out data based on standard deviation of the diffs.
So, I'd prefer to not yet add such a big amount of copy-and-paste. (And, see
below, as you say, I don't think this quite works either, or at least there is
some evidence it changes behavior.) I think it's fine work though and worth
having in a patch; if it's really popular, well there's nothing terribly wrong
with it, just duplicative.
For the rest of the code --
I am happy to add more constructors for the running averages in general, sure,
plus tests. I think some of the tests need a fix -- for example the copy
constructor test doesn't test a copy.
I can add the FastByIDMapTest. The copied keySet/values tests don't test those
methods, really but I can remove them.
I am not sure why the slope-one tests would need to be changed -- this suggests
there's something about the changes that might have introduced a bug. I might
have to peel some of that back if it looks like the case.
Likewise, I am not sure why the IR stats eval would have changed; this ought
not affect anything there.
I would in general not want changes that make fields public or protected.
> FastByIDRunningAverage: Optimize SlopeOneRecommender by optimizing
> MemoryDiffStorage
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-824
> URL: https://issues.apache.org/jira/browse/MAHOUT-824
> Project: Mahout
> Issue Type: Improvement
> Reporter: Lance Norskog
> Priority: Trivial
> Fix For: 0.6
>
> Attachments: MAHOUT-824.patch, MAHOUT-824.short.patch
>
>
> The SlopeOneRecommender has by far the best RMS of all of the online
> recommenders in Mahout (that I've found). Unfortunately the implementation
> also uses much more memory and is unuseable on my laptop.
> This patch optimizes memory (and speed) by folding
> FastByIDMap<RunningAverage> into one class: FastByIDRunningAverage. This is
> what it sounds like: a Long-addressable array of running averages (and
> optionally standard deviation).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira