Re: [Dbpedia-discussion] DBpedia ontology

Patrick Cassidy Wed, 28 Dec 2011 09:30:24 -0800

Tom,
  Thanks for the feedback:
[TM] >  You could create an ontology as it "should" be or you 
>  can use an ontology which matches the practices and conventions  used 
> by the Wikipedia editors.  The latter is going to be messy in many 
> ways, but at least it'll have a large quantity of data to work with.

  The way an ontology *should be* is the way it will be most useful to those
who intend to use it.  That means, it should be comprehensible and
acceptable to them.  As languages go (an ontology serves as a logical
language), that also means that it will be the sum of the inputs of those
who use it, not something imposed by some external authority. The question
that I have not been able to resolve in my brief look at the DBpedia site
is, just how is it anticipated that the ontology will be used?  Is there an
application that uses it?  The application that uses it will be the ultimate
arbiter of how it "should be".  I will much appreciate a reference to actual
uses in applications, where I can see how it is used and whether additional
precision may be useful. 

I am aware of how wary people (including myself) are of those who would want
to impose some ontology or terminology on a community. There is a long
history of such efforts.  The common resistance to using a complex system
devised by others (if something simpler seems to serve as well)  is one of
the reasons that CYC has not been more widely adopted.  In general, a big
reason for the lack of wide adoption of CYC (and other "upper ontologies")
is that people will only make effort to use another system if they have
examples of uses so that they are convinced it is worth the effort; - but
all significant uses of CYC and SUMO are proprietary and details are not
available to the public.  But there are also many examples where people *do*
make effort to learn a system devised elsewhere, including linguistic
systems, when useful applications can be seen.  It is common even among
ontologists to say that people will prefer to use their own
(language/databases/terminologies/ontologies) so that no one
language/ontology/database will ever be adopted universally, but we have a
fine example of just such adoption of a common language - English.  If you
go to an international conference, virtually everyone speaks English and
presents in English if they want their contributions to be understood by the
largest number of people; the motivation is sufficient for people to make
the effort to learn the language.  And that is where an ontology can serve
any community, or the whole world - in any situation where the creators of
knowledge want to share it - in a precise form suitable for automated
reasoning -  with the whole community, however large or small that community
is.

   As I understand it, the community intended to be served by DBpedia is the
whole world.  That is very ambitious, but I feel certain from my own work
that it is entirely feasible to create an ontology that will be suitable for
that whole world community.  It does take more effort than just
automatically extracting triples from a data source, structured or
unstructured.  Such an ontology cannot be imposed from above, it has to grow
from the needs and practices of the community that uses it.  But it will
benefit from the large amount of work already done building other
ontologies.  Much of the hard work has already been done.

   The problem with using extracted data triples *alone* as a representation
of knowledge is that, except in carefully controlled systems,  they have the
same problems as natural language itself - the same term may be used with
multiple meanings (ambiguity) or many terms may be used with the same
meaning (polysemy).  Using OWL is a good step, but OWL is only a simple
*grammar* for representing knowledge.  Communication requires a common
*vocabulary* as well as a common grammar.  Triple stores created without
prior agreement on terminology may still be useful for some probabilistic
reasoning purposes.  Automated alignment of data from different sources
mostly relies on string matching to identify terms that are likely to have
the same meanings in different data stores.  Reasoning with such databases
can generate inferences that rank results by probabilities, and they can be
sent to a human interpreter who has the final decision as to whether the
inferences are meaningful or nonsensical (as in a Google search).  The
automated alignment methods I have seen (except in very narrowly constrained
domains) tend to have no more than 60% accuracy for any one pair.  Automated
reasoning will have chains of inferences, and any chain more than one
inference in length will likely result in a conclusion that is unlikely to
be true - the longer the chain, the less likely an accurate result.  So if
automated inferencing on data is considered desirable, very high accuracy in
the representation is necessary.  
   The good news is that such high accuracy is in fact *practical* (not
merely possible), if the proper approach is used.  Although different groups
and different communities will insist on using their own local terminology,
accurate alignment among all groups is still possible if each local
community translates its own data into the common language for use by others
- who will then be able to use it even if they have no idea who created the
information, or for what purpose.  Triple stores created by a local group
may be precise if the vocabulary is carefully controlled by common
agreement.  For larger communities, such as that served by WikiPedia, there
is little chance of gaining agreement on a single common terminology **for
all terms**.  The latter qualification is crucial.  What is actually needed
is not wide agreement on a massive terminology of hundreds of thousands of
terms, but only on a basic **defining vocabulary** of a few thousand terms
that is sufficient to describe accurately any specialized concept one would
want to define.  Learning to use such a terminology (or an associated
ontology) will be comparable in effort to developing a working knowledge of
a second language.  In effect, in any given community that generates data,
it is necessary to have at least one person who is "bilingual" in the local
terminology and in the common ontology.  This is perfectly feasible, if one
has the motivation.  I have been concerned with this tactic for database
interoperability for a number of years.  A discussion of the issue is given
in a recent paper: 

[[Obrst, Leo; Pat Cassidy. 2011. The Need for Ontologies: Bridging the
Barriers of Terminology and Data Structures. Chapter 10 (pp. 99 - 124) in:
Societal Challenges and Geoinformatics, Sinha, A. Krishna, David Arctur, Ian
Jackson, and Linda Gundersen, eds.. Publication of a Memoir Volume.
Geological Society of America (GSA). (available at:
http://micra.com/papers/OntologiesForInteroperability.pdf)]]

If there is any prospect or hope that the formalization of knowledge
envisioned by the DBpedia project will ultimately be used for automated
reasoning, it is important that effort be made at an early stage to be sure
that there is a proper foundation for accurate representation and avoidance
of ambiguity.  I can help in this task, and will be happy to do so if others
in the community are willing to make an effort to do the kind of careful
work required.

If, on the other hand, it is expected that only probabilistic information
will be extracted from queries on the DBpedia database, suitable only for
inspection by potential human users, then such care in formalization may not
be required.  But it would still be helpful, and wouldn't add a lot of work
to what is being done.  The main effort is in carefully specifying the
meanings of the relations being used, to avoid ambiguity and duplication.

[TM] 
>  Another way to approach this would be the MCC/CYC approach.  It'll 
> take billions of dollars and you'll need to wait many decades for them 
> to finish, but at the end of it all I'm sure you'd have a perfectly 
> consistent knowledge base.
   The great advantage of a volunteer community is that it doesn't take a
lot of time to get funding, and the expense is mostly born by the volunteers
for their own interests and their own views of what may help the public.  No
funder can impose a set of requirements.  We *can* have a perfectly
consistent database, and the effort of getting agreement on the **basic
vocabulary** is likely to be a great deal less than is commonly supposed,
because that vocabulary is not very large.

The work done on DBpedia thus far appears to me to be a good start.  How to
proceed from here depends on the ultimate goals.  I am very interested in
learning how this community views its future.

Pat

Patrick Cassidy
MICRA Inc.
cass...@micra.com
908-561-3416

-----Original Message-----
From: Tom Morris [mailto:tfmor...@gmail.com]
Sent: Tuesday, December 27, 2011 12:24 PM
To: Patrick Cassidy
Cc: dbpedia-discussion@lists.sourceforge.net
Subject: Re: [Dbpedia-discussion] DBpedia ontology

On Mon, Dec 26, 2011 at 7:26 PM, Patrick Cassidy <p...@micra.com> wrote:
> I have looked briefly at the DBpedia ontology and it appears to leave 
> a great deal to be desired in terms of what an ontology is best suited 
> for: to carefully and precisely define the meanings of terms so that 
> they can be automatically reasoned with by a computer, to accomplish 
> useful tasks.  I will be willing to spend some time to reorganize the 
> ontology to make it more logically coherent, if (1) there are any 
> others who are interested in making the ontology more sound and (2) if 
> there is a process by which that can be done without a very long drawn-out
debate.
>
> I think that the general notion of formalizing the content of the 
> WikiPedia a a great idea, but to be useful it has to be done 
> carefully.  It is very easy, even for those with experience, to put 
> logically inconsistent assertions into an ontology, and even easier to 
> put in elements that are so underspecified that they are ambiguous to 
> the point of being essentially useless for automated reasoning.  The 
> OWL reasoner can catch some things, but it is very limited, and unless 
> a first-order reasoner is used one needs to be exceedingly careful about
how one defines the relations.

You could create an ontology as it "should" be or you can use an ontology
which matches the practices and conventions used by the Wikipedia editors.
The latter is going to be messy in many ways, but at least it'll have a
large quantity of data to work with.  Getting any use out of the former
would require you convincing all Wikipedians to adhere to your strict
conventions, which seems unlikely to me.

Another way to approach this would be the MCC/CYC approach.  It'll take
billions of dollars and you'll need to wait many decades for them to finish,
but at the end of it all I'm sure you'd have a perfectly consistent
knowledge base.

Tom

----------------------------------------------------------------------------
--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create new
or port existing apps to sell to consumers worldwide. Explore the Intel
AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] DBpedia ontology

Reply via email to