Maybe a dumb question, but why subtract the links? I can only get from A to
B via a hyperlink (well, if I navigate directly to B, is the fact that I was
on A meaningful?)  Normalizing for transitions that correspond to a link
seems to do nothing. Maybe I do not understand the problem fully.

An A-C-B transition doesn't suggest that A should be recommended from B
right, but, say, A-B-A would. My point was only that it is not always
symmetric of course, and so applying CF gets a little trickier since the
algorithms would assume symmetry.

Would a short Markov chain work and scale? For 3 elements, it needs storage
proportional to the cube of the average number of links per page. I don't
think CF will scale nearly as well here; it is not feeling like quite the
right tool for the job.

Sean

On 19 Jan 2009, 8:45 AM, "Goel, Ankur" <[email protected]> wrote:



Ted / Sean,

The link structure should definitely be subtracted. From the original
dataset or from the recommended item-set is left to the implementation.
I think it will be easier to do this from the recommended item-set.

As for not recommending urls in reverse order (B for A but not A for B,
given B appeared after A) one will have to keep track of his current
browsing history and remove those that user has already seen. Although
if user does reach B through some other link C then it does make sense
to recommend A.

Given the size of the data-set what kind of algorithm and keeping in
mind that it could grow in future what algorithms would you try out?

-----Original Message----- From: Ted Dunning [mailto:[email protected]]

Sent: Sunday, January 18, 2009 2:06 AM To: [email protected]

Subject: Re: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer >
Predicting next URL is an i...

Reply via email to