We shouldn't make the mistake that Watson is actually thinking about or knows the answers. What it does is akin to what Google does. It rips apart the category and original clue into important and non-important terms to search (with some programming help for Jeopardy-like wordplay that often occurs). This set of data is used to search a humongous tagged database to find a series of hits which are scored (probably, again similar to Google or Bing or whatever you fancy) based on the count and relative proximity of the search terms within the hits. From the highest scored results, it would then use a similar algorithm to throw out trivial words and use some lexical analysis to select a set of "important words/phrases" -- from which plausible answers are selected. An algorithm can then score these words/phrases based on the frequency that they appear in the high scored search hits and their proximity to the search words within those hits. The program is also influenced by training (through machine learning techniques) as to what answers are more likely to be right or wrong.
The resultant word/phrase with the highest score is then selected as "the answer." You can try it out on Google. Select one of the answers that Watson generated the correct question for and type the category and answer, verbatim, into the search term. It is highly likely that the words to generate the right question appear in the actual text that Google returns (Google is not programmed to answer the query in the form of a question ... :-) ) You can't use a question that Watson missed (like the Toronto one) at this point because the tagging metadata for every one of them is dominated by discussion topics like this one, so the top 100-1000 hits are all going to point to various online discussions (like this one) on why Watson "got it wrong," rather than a reasonable answer. Disclaimer: I had nothing to do with the programming for Watson, just what I've been able to piece together based on what's been released and what I know about search and machine learning. Scott Fagen Chief Architect CA Mainframe ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

