Maybe a dumb question, but why subtract the links? I can only get from A to B via a hyperlink (well, if I navigate directly to B, is the fact that I was on A meaningful?) Normalizing for transitions that correspond to a link seems to do nothing. Maybe I do not understand the problem fully.
An A-C-B transition doesn't suggest that A should be recommended from B right, but, say, A-B-A would. My point was only that it is not always symmetric of course, and so applying CF gets a little trickier since the algorithms would assume symmetry. Would a short Markov chain work and scale? For 3 elements, it needs storage proportional to the cube of the average number of links per page. I don't think CF will scale nearly as well here; it is not feeling like quite the right tool for the job. Sean On 19 Jan 2009, 8:45 AM, "Goel, Ankur" <[email protected]> wrote: Ted / Sean, The link structure should definitely be subtracted. From the original dataset or from the recommended item-set is left to the implementation. I think it will be easier to do this from the recommended item-set. As for not recommending urls in reverse order (B for A but not A for B, given B appeared after A) one will have to keep track of his current browsing history and remove those that user has already seen. Although if user does reach B through some other link C then it does make sense to recommend A. Given the size of the data-set what kind of algorithm and keeping in mind that it could grow in future what algorithms would you try out? -----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Sunday, January 18, 2009 2:06 AM To: [email protected] Subject: Re: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer > Predicting next URL is an i...
