Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

David Booth Thu, 03 Apr 2014 15:16:01 -0700

First of all, my sincere apologies to Pat, Peter and the rest of the
readership for totally botching my last example, writing "domain" when

I meant "range" *and* explaining it wrong. Sorry for all the confusionit caused!


I was simply trying to demonstrate how a schema:domainIncludes
assertion could be useful for error checking even if it had no
formal entailments, by making selective use of the CWA.  I'll
try again.

Suppose we are given these RDF statements, in which the author
*may* have made a typo, writing ddd instead of ccc as the rdf:type
of x:

  x ppp y .                       # Triple A
  x rdf:type ddd .                # Triple B
  ppp schema:domainIncludes ccc.  # Triple C

As given, these statements are consistent, so a reasoner
will not detect a problem.  Indeed, they may or may
not be what the author intended.  If the author later
added the statement:

  ccc owl:equivalentClass ddd .   # Triple E

then ddd probably was what the author intended
in triple B.  OTOH if the author later added:

  ccc owl:disjointWith ddd .      # Triple F

then ddd probably was not what the author intended
in triple B.

However, thus far we are only given triples {A,B,C}
above, and an error checker wishes
to check for *potential* typos by applying the rule:

  For all subgraphs of the form

    { x ppp y .
      ppp schema:domainIncludes ccc . }

  check whether

     { x rdf:type ccc . }

  is *provably* true.  If not, then fail the
  error check.  If all such subgraphs pass, then
  the error check as a whole passes.

Under the OWA, the requirement:

     { x rdf:type ccc . }

is neither provably true nor provably false given
graph {A,B,C}.  But under the CWA it is
considered false, because it is not provably true.

This is how the schema:domainIncludes can be
useful for error checking even if it has no formal
entailments: it tells the error checker which
cases to check.

I hope that now makes more sense.   Again, sorry to
have screwed up my example so badly last time, and
I hope I've got it right this time.  :)

David


On 04/02/2014 11:42 PM, Pat Hayes wrote:


On Mar 31, 2014, at 10:31 AM, David Booth <da...@dbooth.org> wrote:

On 03/30/2014 03:13 AM, Pat Hayes wrote:

[ , . . ]
What follows from knowing that

ppp schema:domainIncludes ccc . ?

Suppose you know this and you also know that

x ppp y .

Can you infer x rdf:type ccc? I presume not, since the domain might
include other stuff outside ccc. So, what *can* be inferred about the
relationship between x and ccc ? As far as I can see, nothing can be
inferred. If I am wrong, please enlighten me. But if I am right, what
possible utility is there in even making a schema:domainIncludes
assertion?

If "inference" is too strong, let me weaken my question: what
possible utility **in any way whatsoever** is provided by knowing
that schema:domainIncludes holds between ppp and ccc? What software
can do what with this, that it could not do as well without this?


I think I can answer this question quite easily, as I have seen it come up 
before in discussions of logic.

...

Note that this categorization typically relies on making a closed world 
assumption (CWA), which is common for an application to make for a particular 
purpose -- especially error checking.


Yes, of course. If you make the CWA with the information you have, then

ppp schema:domainIncludes ccc .

has exactly the same entailments as

ppp rdfs:domain ccc .

has in RDFS without the CWA. But that, of course, begs the question. If you are 
going to rely on the CWA, then (a) you are violating the basic assumptions of 
all Web notations and (b) you are using a fundamentally different semantics. 
And see below.

None of this has anything to do with a distinction between entailment and error 
checking, by the way. Your hypothetical three-way classification task uses the 
same meanings of the RDF as any other entailment task would.


In this example, let us suppose that to pass, the object of every predicate must be in 
the "Known Domain" of that predicate, where the Known Domain is the union of 
all declared schema:domainIncludes classes for that predicate.   (Note the CWA here.)

Given this error checking objective, if a system is given the facts:

  x ppp y .
  y a ccc .

then without also knowing that "ppp schema:domainIncludes ccc", the system may 
not be able to determine that these statements should be considered Passed or Failed: the 
result may be Indeterminate.  But if the system is also told that

  ppp schema:domainIncludes ccc .

then it can safely categorize these statements as Passed (within the limits of 
this error checking).


Why? [ y a cc . ] does not follow from this assertion and the x ppp y, so this 
looks like an Indeterminate to me. Even with the CWA applied to ppp, your check 
here is extremely risky. In fact, I could invoke Gricean reasoning to conclude 
that the domain of ppp **almost certainly must** include something outside ccc; 
because if not, why did whoever wrote this use the more cautious 
schema:domainIncludes rather than the simpler and more direct rdfs:domain? 
Indeed, isnt the ubiquity of the OWA in Web reasoning the only justification 
for having a construct like schema:domainIncludes at all? Why else was it 
invented, if not to allow for further information to make the domain larger?

Thus, although schema:domainIncludes does not enable any new entailments under 
the open world assumption (OWA), it *does* enable some useful error checking 
inference under the closed world assumption (CWA), by enabling a shift from 
Indeterminate to Passed or Failed.


I would not want any important decision to rest on such an extremely flaky 
foundation as this.


If anyone is concerned that this use of the CWA violates the spirit of RDF, 
which indeed is based on the OWA (for *very* good reason), please bear in mind 
that almost every application makes the CWA at some point, to do its job.


Um, bullshit. But in any case, even if it were true, the important thing is to 
know when to invoke the CWA. Assuming that you know all the domain, when you 
have been told explicitly that you probably have not been told all of it, is a 
very bad heuristic for invoking the CWA.

Pat


David


------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
pha...@ihmc.us       http://www.ihmc.us/users/phayes

Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

Reply via email to