On Nov 9, 2011, at 10:16 AM, Sebastian Schelter wrote: > Hi Grant, > > I'm currently looking into MailToRecMapper to understand the data you > extract from the ASF email archives. (Haven't had the time to actually > run it yet) > > As far as I understand it outputs > > from,msgId,1 > > for each mail. What exactly is the msgId here?
It's the mail message-id header > > I'm searching for an example where I have implicit feedback data in the form > > <user> <item> <number of observed interactions> > > It would be important to have different numbers of interaction as the > algorithm I'm trying to exemplify uses this number to calculate a > "confidence" for the data point. E.g. if a user has never seen some > movie, you would see 0 interactions, which could mean that he doesn't > like the movie, but it could also mean he just doesn't know it exists, > so we have low confidence in the observation. On the other hand if he > watched the movie 20 times, we can be pretty sure he likes it. > > Would it be possible to extract data in the form > > <email> <thread> <number of responses> Yeah, I think so. That was my original plan, but then decided not to, but the code should be simple. > > from the asf email archives? I recall a discussion stating that > identifying a thread is pretty hard task... > > Best, > Sebastian > > > > On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote: >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 >> ] >> >> Grant Ingersoll commented on MAHOUT-878: >> ---------------------------------------- >> >> See also the stuff I did for build-asf-email.sh. Would be nice to add into >> that. >> >>> Provide better examples for the parallel ALS recommender code >>> ------------------------------------------------------------- >>> >>> Key: MAHOUT-878 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-878 >>> Project: Mahout >>> Issue Type: Task >>> Components: Collaborative Filtering >>> Affects Versions: 1.0 >>> Reporter: Sebastian Schelter >>> Assignee: Sebastian Schelter >>> >>> We should provide examples that show how to apply the parallel ALS >>> recommender to the Netflix or KDD2011 datasets. >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators: >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> > -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com
