Chris Anderson wrote:

Sam Joseph wrote:
>
> > Where the ranks indicate the number of times a user has bookmarked
> > something after searching for it with that keyword, the number of
times
> > it was clicked through after it was searched for using that keyword
and
> > the number of times it was returned as a search result for that
keyword.
> >   You can then use some prob maths to compare the ranks.  See:
> >
> > http://www.neurogrid.net/WhitePaper0_3.html
> >
> > For more details of the maths that can be used for this.
> >

> Gotcha.  Any thoughts on profiling users bookmarks to estimate keyword

> rankings of new data?

Well, that's kind of what I'm working on with NeuroGrid now.  It's not
set up yet, but my approach is to get a person's bookmark file, extract
all of the urls out of it, download each of those pages, chew them up,
spit out all the tags, and then use some basic information retrieval
statistics (like TFIDF - term frequency inverse document frequency) to
work out which subset of keywords are relevant and use those as the
basis for a user's NeuroGrid profile.

One could go so far as to try and create ranks based on the TFIDF and
then translate them into usage ranks, like the ones I described, but I
think they are just a very different kind of thing, and the idea with NG
is that user's should be able to edit all the associations between
keywords and their bookmarks, it should all be personalised.  So I would
imagine using the bookmark file as a way to get some urls into the
system, a little TFIDF to provide base associations and then let the
searching do its work.  NG searching allows urls to become associated
with other keywords through multiple keyword searches and so on, so I'm
kind of putting my trust in that, rather than some information
theoretical scheme that allegedly works out the *best* representation
for the data.

I think that data should be represented in a way that reflects the way
it gets used.

CHEERS> SAM

p.s. any tips on how I can get my mails to follow the threading in these
lists.  I beginning to think my only option is to re-subscribe and
receive individual messages.




_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

Reply via email to