[
https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-676.
------------------------------
Resolution: Won't Fix
Assignee: Sean Owen
On this issue, I had the impression you were just posting patches as a way of
batting around ideas. Is this intended for a commit? I get that impression
since it's re-opened.
While I am putting it back to that state based on comments from Ted and myself,
that's more a statement of what I think the current status is for tracking
rather than a final judgment.
If you're interested in creating a committable patch, I do think it's worth
discussing before coding here, since it's not yet clear support here. (Anyone
else?)
There are outstanding comments on the micro-level details of the patch from
last time, and some additional code style changes that would be implemented.
But these are small; while it would be worthwhile to make those adjustments
yourself, they can be dealt with straightforwardly.
The piece I'm also a bit fuzzy on is the relation to Mahout's current code. I
understand the desire to refactor, improve and generalize sampling stuff in
Mahout but the patch isn't doing that as much as adding on additional sampling
stuff. That would be a good change and I commented on what I personally would
imagine that looks like.
I don't follow the use case for recommenders above -- well, I understand the
idea of weighting for sure but am thinking about how generally used or
applicable this is? Is it a core function that needs to be in the framework, or
one possible use of the various extension points already in place?
> Random samplers in a modular library
> ------------------------------------
>
> Key: MAHOUT-676
> URL: https://issues.apache.org/jira/browse/MAHOUT-676
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Lance Norskog
> Assignee: Sean Owen
> Priority: Minor
> Attachments: MAHOUT-676.patch, Sampler.patch
>
>
> This is a modular suite of samplers. It supplies the ability to throw away
> samples in a useful way.
> Here is a use case: for my recommendations, I want user activity to decide
> the amount of influence on the results. For the number of users who watch X
> number of movies: 1-5 is 20%, 6-15 is 50%, 15-30 is 30 %, and users who watch
> over 30 movies are not useful.
> * If I know the input distribution, I can supply a function to the Slice
> sampler to give this distribution.
> * If I don't know the distribution, I can create a Reservoir sampler for each
> of the three buckets. After reading the whole set, I check the sizes of the
> various buckets and solve for my distribution. This gives the number of users
> to pull from each bucket.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira