[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

Anatoliy Kats (Commented) (JIRA) Thu, 15 Dec 2011 06:09:06 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170213#comment-13170213
 ]


Anatoliy Kats commented on MAHOUT-906:
--------------------------------------

Ah, I see what you mean.  That would be neat indeed, because we could unify the 
split for both classes of evaluators.  My concern, though, is that it's hard 
for me to imagine a case where I could use the same splitting strategy for both 
tests -- IRStatsTest currently requires that we test on one user at a time, for 
instance.  In general my feeling is that the nature of the tests requires 
different splitting strategies.  I'd feel more comfortable keeping things the 
way they are for the time being while we get some experience with the different 
splitting strategies that our users might need.  Afterwards, if it becomes 
apparent your proposal is in our users' interest, we can deprecate the API in 
my patch, and write a wrapper class that takes my IRDataSplitter and converts 
it to your splitter.
                
> Allow collaborative filtering evaluators to use custom logic in splitting 
> data set
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-906
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-906
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Priority: Minor
>              Labels: features
>         Attachments: MAHOUT-906.patch, MAHOUT-906.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I want to start a discussion about factoring out the logic used in splitting 
> the data set into training and testing.  Here is how things stand:  There are 
> two independent evaluator based classes:  
> AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly 
> into a training and testing set.  GenericRecommenderIRStatsEvaluator takes 
> one user at a time, removes their top AT preferences, and counts how many of 
> them the system recommends back.
> I have two use cases that both deal with temporal dynamics.  In one case, 
> there may be expired items that can be used for building a training model, 
> but not a test model.  In the other, I may want to simulate the behavior of a 
> real system by building a preference matrix on days 1-k, and testing on the 
> ratings the user generated on the day k+1.  In this case, it's not items, but 
> preferences(user, item, rating triplets) which may belong only to the 
> training set.  Before we discuss appropriate design, are there any other use 
> cases we need to keep in mind?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

Reply via email to