We shouldn't make the mistake that Watson is actually thinking about or
knows the answers.  What it does is akin to what Google does.  It rips apart
the category and original clue into important and non-important terms to
search (with some programming help for Jeopardy-like wordplay that often
occurs).  This set of data is used to search a humongous tagged database to
find a series of hits which are scored (probably, again similar to Google or
Bing or whatever you fancy) based on the count and relative proximity of the
search terms within the hits.  From the highest scored results, it would
then use a similar algorithm to throw out trivial words and use some lexical
analysis to select a set of "important words/phrases" -- from which
plausible answers are selected.  An algorithm can then score these
words/phrases based on the frequency that they appear in the high scored
search hits and their proximity to the search words within those hits.  The
program is also influenced by training (through machine learning techniques)
as to what answers are more likely to be right or wrong.

The resultant word/phrase with the highest score is then selected as "the
answer."  

You can try it out on Google.  Select one of the answers that Watson
generated the correct question for and type the category and answer,
verbatim, into the search term.  It is highly likely that the words to
generate the right question appear in the actual text that Google returns
(Google is not programmed to answer the query in the form of a question ...
:-) )

You can't use a question that Watson missed (like the Toronto one) at this
point because the tagging metadata for every one of them is dominated by
discussion topics like this one, so the top 100-1000 hits are all going to
point to various online discussions (like this one) on why Watson "got it
wrong," rather than a reasonable answer.

Disclaimer:  I had nothing to do with the programming for Watson, just what
I've been able to piece together based on what's been released and what I
know about search and machine learning.

Scott Fagen
Chief Architect
CA Mainframe

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to