On Wed, 4 May 2005, Josh White wrote:
terms ofDear Open Heart Logic user. As part of _The Turing Challenge_ we would love it if you would rate the following assertion inits believability...
"Vienna is wet."
(1) V.Unbeliveable (2) Unbelievable (3) Nuetral (4) Believable (5) V Believable
I think this is a very good idea, in general.
Great.
I don't think you'll get much participation unless you tell the user the computer's best guess AFTER they click the answer. The point is the user can see how their answer improved the computer.
Ah. Very interesting and excellent point. Once again, you "get it" and then take it a step further. (If we give them this feedback there is a slight risk that a person would start to think like Cyc rather than a naive and natural human, but, I think this risk is small, unimportant, and distracting...The real problem will be keeping people's motivation and excitment up.)..So yes I agree that we should do this.
Read my very rough draft mock up of this screen below and see if you
have any comments. Farther below you will see some subtleties that
might complicate matters (see {a} below). Of course, we can tweak it
even after it is up as a dynamic WWW page. So, there's a risk in
giving too much precise feedback on actual wording what I am mainly
after now are overall design type issues.Joshua, can you add this feature?
The feature is this: Why don't we call it the "Feedback to User" page. (better names please!!!) after user selects a rating, a new window comes up. The new window has text that we will manually craft (in the beginning...later we may wish to call Cyc or KM to generate part of that page...dream on).
Here is a baby step one example of "Feedback to user" page:
Assume User Joe has selected 2, Unbelievable as his rating for "Vienna is wet.". This is what we want on that page......
......START OF FEEDBACK TO USER PAGE MOCK UP.....
Thanks for your rating, Joe! Ratings like yours can help us evualate and improve our AI models.
You rated the believability of the assertion...
"Vienna is wet."
..as: Unbelievable.
In case you are interested, we would like to give you some context for the item that you just rated for us. If you are not interested, just move on to the next item.
Cyc believes that "Vienna is wet." is true.
Cyc believs this assertion because it concluded it based on an inference. The following facts and rules caused Cyc to conclude "Vienna is wet."
(1) "Rivers are a kind of water." (2) "If water touches x then x is wet." (3) "The Danube is a river." (4) "The Danube runs through Vienna." (5) "If a river runs through a region it touches that region."
(Note that these are represented in Cyc not in English but rather in a logical computer language. We have translated these facts and rules into English to make them easy for you to understand.)
......END OF FEEDBACK TO USER PAGE MOCK UP.....
Also, some people at hotornot put in wrong answers on purpose, just to hack the system. I think this system is even more tempting to lie to.
Definitely.
And this is a good seguey (sp?) into what I referred to above as the
"subtlies" {a}.What we should do is give some items are "reversed" or "mutated" or "perturbed."...In other words, for SOME items we will ask participants to rate "normal" Cyc assertions. These assertions may be be ground facts (like "The Danube runs through Vienna") or deductions (like "Vienna is wet.") or maybe even rules (e.g. "If a river runs through a region it touches that region.").
But OTHER items will be intentially reversed. We would predict that humans would rate these as less believable than unreversed items.
This is what I did in my dissertation. In study 2 half of the items were unreversed and the other half were reversed. In study 3, a third of the items were unreversed, another third were "slight" reversed, and another third were "strongly" reversed.
There are two reasons we want to do this reversal stuff. One reason is to catch liars or vandals.
The OTHER reason is to allow us to compare the mean believability of different groups of items. E.g....
unversed items vs reversed items human generated items vs machine generated items
deductions vs ground facts
...this is all part of the computational ablation paradigm and it figured big time in my dissertation. It is an example of what I mean by good and rigorous methodology.
[Now, it occurs to me that there is a THIRD more minor, user-interfacey sorta reason to do this...That is that we want quick *coarse* judgements about whethher a commonsense assertion is a good one or not. One way to obtain such coarseness is to throw in a fair number of ridiculous assertions.]
Well, I should enlist some other AI gurus opinions on this before I spout off too loudly about good and rigorous methodology. Speaking of AI gurus, Peter, are you on this list yet?
So Josh and Joshua does this make sense conceptually, designwise?
Joshua, can you implement this. Note: Just getting the believability ratings up there is step one. Implementning the Feedback to User is step two.
Bill
-Josh
_______________________________________________ Heartlogic-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/heartlogic-dev
