RE: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer

Goel, Ankur Mon, 19 Jan 2009 00:45:23 -0800


Ted / Sean,

The link structure should definitely be subtracted. From the original
dataset or from the recommended item-set is left to the implementation.
I think it will be easier to do this from the recommended item-set.

As for not recommending urls in reverse order (B for A but not A for B,
given B appeared after A) one will have to keep track of his current
browsing history and remove those that user has already seen. Although
if user does reach B through some other link C then it does make sense
to recommend A.

Given the size of the data-set what kind of algorithm and keeping in
mind that it could grow in future what algorithms would you try out?

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Sunday, January 18, 2009 2:06 AM
To: [email protected]
Subject: Re: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer

> Predicting next URL is an important RI problem and using an item-set
> predictor or an indicator-set predictor or a latent-variable predictor
is
> likely to work reasonably well.  The asymmetry of the prediction is
not
> particularly a problem since it captures important structural cues
(web
> links are unidirectional).  It is important, however, to subtract away
the
> link structure of the web pages before evaluating the system since
> suggesting that the user simply follow links that already exist is
less >than interesting.  As such, a raw markov chain isn't likely to
work well.

On Sat, Jan 17, 2009 at 5:24 AM, Sean Owen <[email protected]> wrote:

> But then again is this a CF problem? Sounds like markov chains...
given the
> last 1 or 2 or 3 URLs visited, which URL has been next, most often? I
think
> that's relatively easy and fast, does that work?
>

-- 
Ted Dunning, CTO
DeepDyve
4600 Bohannon Drive, Suite 220
Menlo Park, CA 94025
www.deepdyve.com
650-324-0110, ext. 738
858-414-0013 (m)

RE: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer

Reply via email to