Nice point about before-time/after-time training & prediction sets!

On Fri, Dec 16, 2011 at 12:52 AM, Anatoliy Kats (Commented) (JIRA)
<j...@apache.org> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170836#comment-13170836
>  ]
>
> Anatoliy Kats commented on MAHOUT-906:
> --------------------------------------
>
> Not yet, just a refactoring so far.  Still working on it.
>
>> Allow collaborative filtering evaluators to use custom logic in splitting 
>> data set
>> ----------------------------------------------------------------------------------
>>
>>                 Key: MAHOUT-906
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-906
>>             Project: Mahout
>>          Issue Type: Improvement
>>          Components: Collaborative Filtering
>>    Affects Versions: 0.5
>>            Reporter: Anatoliy Kats
>>            Priority: Minor
>>              Labels: features
>>         Attachments: MAHOUT-906.patch, MAHOUT-906.patch, MAHOUT-906.patch, 
>> MAHOUT-906.patch
>>
>>   Original Estimate: 48h
>>  Remaining Estimate: 48h
>>
>> I want to start a discussion about factoring out the logic used in splitting 
>> the data set into training and testing.  Here is how things stand:  There 
>> are two independent evaluator based classes:  
>> AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly 
>> into a training and testing set.  GenericRecommenderIRStatsEvaluator takes 
>> one user at a time, removes their top AT preferences, and counts how many of 
>> them the system recommends back.
>> I have two use cases that both deal with temporal dynamics.  In one case, 
>> there may be expired items that can be used for building a training model, 
>> but not a test model.  In the other, I may want to simulate the behavior of 
>> a real system by building a preference matrix on days 1-k, and testing on 
>> the ratings the user generated on the day k+1.  In this case, it's not 
>> items, but preferences(user, item, rating triplets) which may belong only to 
>> the training set.  Before we discuss appropriate design, are there any other 
>> use cases we need to keep in mind?
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to