On 26/11/2007, Tom Loosemore <[EMAIL PROTECTED]> wrote:

> ...you can minimise "false positive" terms by running the copy
> through several different flavours of term extractor, and only using
> terms thrown up by x or more of them (where x depends on your appetite
> for false positives vs false negatives).
>
> So, why not throw the copy through several more term extractors then
> only use the overlapping terms?

This should work (and it's been suggested on the backstage-dev list
recently). Though I'm uneasy about a possible situation where one of
your term extractors comes up with a great set of terms, but the
others miss them completely, and so your output is a bad compromise of
terms that aren't that meaningful.

Do any APIs let you see the confidence score on their output terms?
Having admittedly not thought about this much, it seems to me that a
confidence score is key to any realistic combination algorithm.

In terms (sorry) of quality of output, people seem to like Yahoo's
API. I've come across Trynt's offering too
(http://www.trynt.com/trynt-contextual-term-extraction-api/ ), but
ominously their website is giving me a 403 Forbidden error right now.
http://www.programmableweb.com/api/clearforest-semantic-web-services1/
has also been suggested on the "pure technical discussion" list.

> - The BBC has at least one *excellent* term extractor in house which
> adds extra metadata like 'this term is a person/place/topic'... would
> be a lovely API to offer, hint hint...

Ah - has this been used to derive the subject categories and
contributors for the web version of Infax, by any chance? If so, and
even if not, that would be a gorgeous API to offer - please, BBC...

Rhys
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/

Reply via email to