On Nov 9, 2011, at 10:16 AM, Sebastian Schelter wrote:

> Hi Grant,
> 
> I'm currently looking into MailToRecMapper to understand the data you
> extract from the ASF email archives. (Haven't had the time to actually
> run it yet)
> 
> As far as I understand it outputs
> 
> from,msgId,1
> 
> for each mail. What exactly is the msgId here?

It's the mail message-id header

> 
> I'm searching for an example where I have implicit feedback data in the form
> 
> <user> <item> <number of observed interactions>
> 
> It would be important to have different numbers of interaction as the
> algorithm I'm trying to exemplify uses this number to calculate a
> "confidence" for the data point. E.g. if a user has never seen some
> movie, you would see 0 interactions, which could mean that he doesn't
> like the movie, but it could also mean he just doesn't know it exists,
> so we have low confidence in the observation. On the other hand if he
> watched the movie 20 times, we can be pretty sure he likes it.
> 
> Would it be possible to extract data in the form
> 
> <email> <thread> <number of responses>

Yeah, I think so.  That was my original plan, but then decided not to, but the 
code should be simple.

> 
> from the asf email archives? I recall a discussion stating that
> identifying a thread is pretty hard task...
> 
> Best,
> Sebastian
> 
> 
> 
> On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
>> 
>>    [ 
>> https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103
>>  ] 
>> 
>> Grant Ingersoll commented on MAHOUT-878:
>> ----------------------------------------
>> 
>> See also the stuff I did for build-asf-email.sh.  Would be nice to add into 
>> that.
>> 
>>> Provide better examples for the parallel ALS recommender code
>>> -------------------------------------------------------------
>>> 
>>>                Key: MAHOUT-878
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-878
>>>            Project: Mahout
>>>         Issue Type: Task
>>>         Components: Collaborative Filtering
>>>   Affects Versions: 1.0
>>>           Reporter: Sebastian Schelter
>>>           Assignee: Sebastian Schelter
>>> 
>>> We should provide examples that show how to apply the parallel ALS 
>>> recommender to the Netflix or KDD2011 datasets.
>> 
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA 
>> administrators: 
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to