[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

Anatoliy Kats (Commented) (JIRA) Mon, 05 Dec 2011 05:28:04 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162770#comment-13162770
 ]


Anatoliy Kats commented on MAHOUT-906:
--------------------------------------

A time-based recommender would be great!  However, I think we should take it 
one step at a time.  Start with the evaluation, then implement the algorithms 
themselves.

I thought about what it takes to implement an evaluator for a time-based 
algorithm.  As a bare minimum, we need the FileDataModel to read the time and 
date information, store it in the Preference class.  In addition, there may be 
other considerations that determine whether an item is eligible to be in the 
test set.  In our case, there is the user's browsing history that landed it to 
an item.  We can store whatever we need in additional columns of the input data 
file.  FileDataModel ought to be able to read it, and be easily extendable to a 
class that processes this extra data.  These modification are tricky for a 
newcomer.  I can give it a try, but since I do not have the big picture view, I 
may mess up something important.  I'll start on the obvious modifications while 
I wait for your comments.
                
> Allow collaborative filtering evaluators to use custom logic in splitting 
> data set
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-906
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-906
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Priority: Minor
>              Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I want to start a discussion about factoring out the logic used in splitting 
> the data set into training and testing.  Here is how things stand:  There are 
> two independent evaluator based classes:  
> AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly 
> into a training and testing set.  GenericRecommenderIRStatsEvaluator takes 
> one user at a time, removes their top AT preferences, and counts how many of 
> them the system recommends back.
> I have two use cases that both deal with temporal dynamics.  In one case, 
> there may be expired items that can be used for building a training model, 
> but not a test model.  In the other, I may want to simulate the behavior of a 
> real system by building a preference matrix on days 1-k, and testing on the 
> ratings the user generated on the day k+1.  In this case, it's not items, but 
> preferences(user, item, rating triplets) which may belong only to the 
> training set.  Before we discuss appropriate design, are there any other use 
> cases we need to keep in mind?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

Reply via email to