Ted / Sean,
The link structure should definitely be subtracted. From the original dataset or from the recommended item-set is left to the implementation. I think it will be easier to do this from the recommended item-set. As for not recommending urls in reverse order (B for A but not A for B, given B appeared after A) one will have to keep track of his current browsing history and remove those that user has already seen. Although if user does reach B through some other link C then it does make sense to recommend A. Given the size of the data-set what kind of algorithm and keeping in mind that it could grow in future what algorithms would you try out? -----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Sunday, January 18, 2009 2:06 AM To: [email protected] Subject: Re: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer > Predicting next URL is an important RI problem and using an item-set > predictor or an indicator-set predictor or a latent-variable predictor is > likely to work reasonably well. The asymmetry of the prediction is not > particularly a problem since it captures important structural cues (web > links are unidirectional). It is important, however, to subtract away the > link structure of the web pages before evaluating the system since > suggesting that the user simply follow links that already exist is less >than interesting. As such, a raw markov chain isn't likely to work well. On Sat, Jan 17, 2009 at 5:24 AM, Sean Owen <[email protected]> wrote: > But then again is this a CF problem? Sounds like markov chains... given the > last 1 or 2 or 3 URLs visited, which URL has been next, most often? I think > that's relatively easy and fast, does that work? > -- Ted Dunning, CTO DeepDyve 4600 Bohannon Drive, Suite 220 Menlo Park, CA 94025 www.deepdyve.com 650-324-0110, ext. 738 858-414-0013 (m)
