Hi Gail,

I've used Naive Bayes classification [1] to accomplish similar
things in the past.  The method can be used to sort blocks of text
into predefined categories (your taxonomy) based on word frequencies
(in the items that come from automatic feeds).  It's a pretty
popular approach for filtering spam out of inboxes, but it can be
used much more generally and with as many categories as you'd like. 
Implementation tends to follow these steps:

1. Set up a classifier and categories [2]
2. Train the classifier with sample content for each of your
categories
3. Test the classifier with additional sample content to make sure
it's working reasonably well
4. Refine over time

A nice reference implementation might be POPFile [3].  POPFile sorts
emails into categories you define and then refine by letting it know
when it's made a mistake.  The Wikipedia page on Naive Bayes can
lead you to other methods or you might consider a more advanced
solution like SPSS's Predictive Text Analytics [4].

Sincerely,

Joseph Dombroski

[1] http://en.wikipedia.org/wiki/Naive_Bayesian_classification
[2] Many programming languages have libraries to make this easier. 
You can also find software that will help you set these up.  I did a
quick search for feed classifiers and found the service
http://rss.knownews.net as well as some software at
http://the.taoofmac.com/space/blog/2006/11/04
[3] http://getpopfile.org/
[4] http://www.spss.com/text_mining_for_clementine/


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=38635


________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... disc...@ixda.org
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

Reply via email to