Re: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

2008-11-28 Thread Kjetil Kjernsmo

On Wednesday 26 November 2008, John Graybeal wrote:
> Do you think the argument is mostly settled, or would you agree that
>   duplicating a massive set of URIs for 'local technical
> simplification' is a bad practice? (In which case, is the question
> just a matter of scale?)

I'm a bit late to the discussion, but I feel that this is a question 
that should be dealt with on a case-by-case basis. It is important that 
if you state that two things are owl:sameAs (or some slightly weaker 
statement), it is important that the two things are in fact the same. 
When publishing larger data sets, it is hard to say with sufficient 
certainty that this is the case. Thus, I feel that the best practice is 
to create new URIs for each thing. Stating that things are the same 
should be left to a separate process. 

One should be aware of the extra complexity that is caused by this; you 
need an extra triple in your SPARQL query, which can also reduce query 
engine performance.

If you are building applications based on linked data rather than 
publishing large data sets, I feel it is better to reuse URIs rather 
than create your own, if you plan to publish your URIs at some point. 

In some cases, you may not use a lot of concepts and in every case a 
human is involved, thus you know that what is meant by the concept 
identified by a given URI. In other cases, you use somebody else's URI 
and say that "whatever they mean by this concept, I mean too". This 
covers most of the cases I think most users of linked data will meet.  


Kjetil
-- 
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
[EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/ OpenPGP KeyID: 6A6A0BBC




Re: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

2008-11-27 Thread Hugh Glaser




On 27/11/2008 13:43, "Georgi Kobilarov" <[EMAIL PROTECTED]> wrote:

> Hi John,
>
>> Do you think the argument is mostly settled, or would you agree that
>> duplicating a massive set of URIs for 'local technical simplification'
>> is a bad practice? (In which case, is the question just a matter of
>> scale?)
>
> In my opinion it's not only about technical simplification. It's a
> question of process.
> When I want to publish a new dataset, only re-using existing URIs would
> mean that I need to find the correct URIs in the first place. That's
> difficult: some concepts in my dataset might not be available in others,
> some concepts might be available in more than one other. Having to
> decide too early restrains me from publishing my data.
And causes a management problem if your get it wrong, or the others change
the meaning, however subtly.
>
> By minting my own URIs, I can split the publishing problem into two
> separate task: 1. publish data and 2. interlink data. And the
> interlinking task could then even be done by someone else...
Oh yes please!

I think it is not just about managing large numbers of URIs across the SW
because you minted a new *one*; we need to mint large numbers of URIs for
our own data because until you can do the complex co-reference work on the
KB, you don't know whether the strings you are working with are the same.
Eg your input text says David Williams is at the University of Newcastle. It
can be some time later before you are confidently able to assert that this
is the University of Newcastle in Australia. In the meantime you mint a new
URI. Later you decide it is co-referent with another.

We take what I think might be called a hybrid approach.
We do mint all the URIs; but we have a sub-system that re-processes the RDF
so that the ones that have been identified as co-referent can be collapsed
onto a single URI, either of our own or somewhere else.
It leaves the old ones around for any future reference in a "deprecated
state".
We call this process "canonisation" :-)

Best
Hugh

>
> Best,
> Georgi
>
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
>
>




RE: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

2008-11-27 Thread Georgi Kobilarov
Hi John,

> Do you think the argument is mostly settled, or would you agree that
> duplicating a massive set of URIs for 'local technical simplification'
> is a bad practice? (In which case, is the question just a matter of
> scale?)

In my opinion it's not only about technical simplification. It's a
question of process.
When I want to publish a new dataset, only re-using existing URIs would
mean that I need to find the correct URIs in the first place. That's
difficult: some concepts in my dataset might not be available in others,
some concepts might be available in more than one other. Having to
decide too early restrains me from publishing my data. 

By minting my own URIs, I can split the publishing problem into two
separate task: 1. publish data and 2. interlink data. And the
interlinking task could then even be done by someone else... 

Best,
Georgi

--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of John Graybeal
> Sent: Wednesday, November 26, 2008 10:54 PM
> To: Richard Cyganiak
> Cc: public-lod@w3.org; Semantic Web
> Subject: Re: Dataset vocabularies vs. interchange vocabularies (was:
> Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to
> Freebase)
> 
> 
> On Nov 19, 2008, at 5:34 PM, Richard Cyganiak wrote:
> 
> > Interestingly, this somewhat echoes an old argument often heard in
> > the days of the “URI crisis” a few years ago: ““We must avoid a
> > proliferation of URIs. We must avoid having lots of URIs for the
> > same thing. Re-use other people's identifiers wherever you can.
> > Don't invent your own unless you absolutely have to.””
> >
> > I think that the emergence of linked data has shattered that
> > argument. One of the key practices of linked data is: ““Mint your
> > own URIs when you publish new data. *Then* interlink it with other
> > data by setting sameAs links to existing identifiers.””
> 
> So this sounds like you are saying there is a near-consensus of the
> semantic web community.  Except, the previous thread on "URIs and
> Unique IDs" emphasized the view of a number of people that multiple
> URIs for the same concept was "bad" (technical term), especially if
> they are generated en masse.
> 
> Do you think the argument is mostly settled, or would you agree that
> duplicating a massive set of URIs for 'local technical simplification'
> is a bad practice? (In which case, is the question just a matter of
> scale?)
> 
> John
> 
> --
> John Graybeal   <mailto:[EMAIL PROTECTED]>  -- 831-775-1956
> Monterey Bay Aquarium Research Institute
> Marine Metadata Interoperability Project: http://marinemetadata.org



Re: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

2008-11-26 Thread John Graybeal


On Nov 19, 2008, at 5:34 PM, Richard Cyganiak wrote:

Interestingly, this somewhat echoes an old argument often heard in  
the days of the “URI crisis” a few years ago: ““We must avoid a  
proliferation of URIs. We must avoid having lots of URIs for the  
same thing. Re-use other people's identifiers wherever you can.  
Don't invent your own unless you absolutely have to.””


I think that the emergence of linked data has shattered that  
argument. One of the key practices of linked data is: ““Mint your  
own URIs when you publish new data. *Then* interlink it with other  
data by setting sameAs links to existing identifiers.””


So this sounds like you are saying there is a near-consensus of the  
semantic web community.  Except, the previous thread on "URIs and  
Unique IDs" emphasized the view of a number of people that multiple  
URIs for the same concept was "bad" (technical term), especially if  
they are generated en masse.


Do you think the argument is mostly settled, or would you agree that  
duplicating a massive set of URIs for 'local technical simplification'  
is a bad practice? (In which case, is the question just a matter of  
scale?)


John

--
John Graybeal     -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org




Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

2008-11-19 Thread Richard Cyganiak


On 17 Nov 2008, at 22:33, Hugh Glaser wrote:
I am a bit uncomfortable with the idea of "you should use a:b from c  
and d:e from f and g:h from i..."
It makes for a fragmented view of my data, and might encourage me to  
use things that do not capture exactly what I mean, as well as  
introducing dependencies with things that might change, but over  
which I have no control.
So far better to use ontologies of type (b) where appropriate, and  
define my own of type (a), which will (hopefully) be nicely  
constructed, and easier to understand as smallish artefacts that can  
be looked at as a whole.
Of course, this means we need to crack the infrastructure that does  
dynamic ontology mapping, etc.

Mind you, unless we have the need, we are less likely to do so.
I also think that the comments about the restrictions being a  
characteristic of the dataset for type (a), but more like comments  
on the world for type (b) are pretty good.


+1 on everything above.

I acknowledge that this is a minority POV at the moment.

The more common POV is: “Re-use classes and properties from well- 
established vocabularies wherever you can. Don't invent your own terms  
unless you absolutely have to.”


Interestingly, this somewhat echoes an old argument often heard in the  
days of the “URI crisis” a few years ago: ““We must avoid a  
proliferation of URIs. We must avoid having lots of URIs for the same  
thing. Re-use other people's identifiers wherever you can. Don't  
invent your own unless you absolutely have to.””


I think that the emergence of linked data has shattered that argument.  
One of the key practices of linked data is: ““Mint your own URIs  
when you publish new data. *Then* interlink it with other data by  
setting sameAs links to existing identifiers.””


The key insight is that linking yields many of the benefits of  
identifier re-use, while being much easier to manage due to the looser  
coupling.


That's for instance data. But a similar argument can be made for  
vocabularies: ““Create your own terms when you publish a new  
dataset. *Then* interlink it with existing vocabularies by setting  
subclass and subproperty links.””


I'm not sure if this is *always* appropriate. But I do believe that  
there is nothing wrong with creating a vocabulary that is tailored to  
your dataset, and *not* intended or designed for re-use by anyone  
else. As long as you publish an RDFS/OWL description of your terms,  
and make an effort to include subclass/subproperty links to common  
vocabularies in it.


Coming back to the point I was trying to make below in the thread:  
Tailored, dataset-specific or site-specific vocabularies are one kind  
of beast; designed-for-reuse interchange vocabularies are another. The  
purpose of the second kind is to serve as common superclasses/ 
superproperties for the first kind, as “linking hubs” so to speak, to  
enable queries or UIs that work across datasets and sites.


I don't see a problem with including tight restrictions such as  
restrictive domain/range statements or cardinality constraints in  
dataset vocabularies, if one finds them helpful for consistency  
checking or dynamic UIs. But in interchange vocabularies, tight  
restrictions hurt reusability, so it's usually better to go very easy  
on the harder RDFS and OWL features.


Best,
Richard




Hugh

On 17/11/2008 20:09, "Richard Cyganiak" <[EMAIL PROTECTED]> wrote:



John,

Here's an observation from a bystander ...

On 17 Nov 2008, at 17:17, John Goodwin wrote:


This is also a good example of where (IMHO) the domain was perhaps
over specified. For example all sorts of things could have
publishers, and not the ones listed here. I worry that if you reuse
DBpedia "publisher" elsewhere you could get some undesired  
inferences.


But are the DBpedia classes *intended* for re-use elsewhere? Or do
they simply express restrictions that apply *within DBpedia*?

I think that in general it is useful to distinguish between two
different kinds of ontologies:

a) Ontologies that express restrictions that are present in a certain
dataset. They simply express what's there in the data. In this sense,
they are like database schemas: If "Publisher" has a range of
"Person", then it means that the publisher *in this particular
dataset* is always a person. That's not an assertion about the world,
it's an assertion about the dataset. These ontologies are usually not
very re-usable.

b) Ontologies that are intended as a "lingua franca" for data exchange
between different applications. They are designed for broad re-use,
and thus usually do not add many restrictions. In this sense, they are
more like controlled vocabularies of terms. Dublin Core is probably
the prototypical example, and FOAF is another good one. They usually
don't allow as many interesting inferences.

I think that these two kinds of ontologies have very different
requirements. Ontologies that are designed for one of these roles are
quite useless if used for the

AW: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-19 Thread Chris Bizer

Hi Dan and all,

it looks to me as we try to solve a variety of different use cases with a
single solution and thus run into problems here.

There are three separate use cases that people participating in the
discussion seem to have in mind:

1. Visualization of the data
2. Consistency checking
3. Interlinking ontologies/schemata on the Web as basis for data integration


For visualization, range and domain constrains are somehow useful (as TimBL
said), but this usefulness is very indirect.
For instance, even simple visualizations will need to put the large number
of DBpedia properties into a proper order and ideally would also support
views on different levels of detail. Both things where range and domain
don't help much, but which are covered by other technologies like Fresnel
(http://www.w3.org/2005/04/fresnel-info/manual/). So for visualization, I
think it would be more useful if we would start publishing Fresnel lenses
for each class in the Dbpedia ontology.

As Jens said, the domains and ranges can be used for checking instance data
against the class definitions and thus detect inconsistencies (this usage is
not really covered by the RDFS specification as Paul remarked, but still
many people do this). As Wikipedia contains a lot of inconsistencies and as
we don't want to reduce the amount of extracted information too much, we
decided to publish the loose instance dataset which also contains property
values that might violate the contrains. I say "might" as we only know for
sure that something is a person if the Wikipedia article contains a
person-related template. If it does not, the thing could be a person or not.

Which raises the question: Is it better for DBpedia to keep the constraints
and publish instance data that might violate these constraints or is it
better to loosen the constraints and remove the inconsistencies this way? Or
keep things as they are, knowing that range and domain statements are anyway
hardly used by existing Semantic Web applications that work with data from
the public Web? (Are there any? FalconS?)

For the third use case of interlinking ontologies/schemata on the Web in
order to integrate instance data afterwards, it could be better to remove
the domain and range statements as this prevents inconsistencies when
ontologies/schemata are interlinked. On the other hand it is likely that the
trust layers of Web data integration frameworks will ignore the domain and
range statements anyway and concentrate more on owl:sameAs, subclass and
subproperty. Again, Falcons and Sindice and SWSE teams, do you use domain
and range statements when cleaning up the data that you crawled from the
Web?

I really like Hugh's idea of having a loose schema in general and add
additional constraints as comments/optional constraints to the schema, so
that applications can decide whether they want to use them or not. But this
is sadly not supported by the RDF standards.

So, I'm still a bit undecided about leaving or removing the ranges and
domains. Maybe leave them, as they are likely not harmful and might be
useful for some use cases?

Cheers

Chris


> -Ursprüngliche Nachricht-
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Im Auftrag von Dan Brickley
> Gesendet: Mittwoch, 19. November 2008 14:09
> An: Pierre-Antoine Champin
> Cc: Paul Gearon; Semantic Web
> Betreff: Re: Domain and range are useful Re: DBpedia 3.2 release,
> including DBpedia Ontology and RDF links to Freebase
> 
> 
> Pierre-Antoine Champin wrote:
> > Paul Gearon a écrit :
> >> While I'm here, I also noticed Tim Finin referring to "domain and
> range
> >> constraints". Personally, I don't see the word "constraint" as an
> >> appropriate description, since rdfs:domain and rdfs:range are not
> >> constraining in any way.
> >
> > They are constraining the set of interpretations that are models of
> your
> > knowledge base. Namely, you constrain Fido to be a person...
> >
> > But I grant you this is not exactly what most people expect from the
> > term "constraint"... I also had to do the kind of explainations you
> > describe...
> 
> 
> Yes, exactly.
> 
> In earlier (1998ish) versions of RDFS we called them 'constraint
> resources' (with the anticipation of using that concept to flag up new
> constructs from anticipated developments like DAML+OIL and OWL). This
> didn't really work, because anything that had a solid meaning was a
> constraint in this sense, so we removed that wording.
> 
> This is a very interesting discussion, wish I had time this week to
> jump
> in further.
> 
> I do recommend against using RDFS/OWL to express application/dataset
> constraints, while recognising that there's a real need for recording
> them in machine-friendly form. I

Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread Pierre-Antoine Champin

Ian Davis a écrit :
> 
> On Tue, Nov 18, 2008 at 4:02 AM, Tim Berners-Lee <[EMAIL PROTECTED]
> > wrote:
> 
> 
> On 2008-11 -17, at 11:27, John Goodwin wrote:
>> [...]
>> I'd be tempted to generalise or just remove the domain/range
>> restrictions. Any thoughts?
> 
> There are lots of uses for rand and domain.
> 
> One is in the user interface -- if you for example link a a person
> and a document, the system
> can prompt you for a relationship which will include "is author of"
> and "made" but won't include foaf:knows or is issue of.
> 
> Similarly, when making a friend, one can us autocompletion on labels
> which the current session knows about and simplify it by for example
> removing all documents from a list of candidate foaf:knows friends.
> 
> 
> Both these use cases require some OWL to say that documents aren't
> people. I don't see these scenarios being feasible in the general case
> because you'd need a complete description of the world in OWL, i.e.
> you'd want to know about everything that can't possibly be a person.

This is technically true.
However, from a user interface point of view, it is reasonable to use
the *explicit* statements as a guiding heuristic -- although it should
be possible, with additional steps, to add a foaf:knows bewteen any two
resources, even if one is not explicitly typed as a foaf:Person.

  pa



Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread Paul Gearon


On Nov 18, 2008, at 1:32 AM, Ian Davis wrote:



On Tue, Nov 18, 2008 at 4:02 AM, Tim Berners-Lee <[EMAIL PROTECTED]> wrote:

On 2008-11 -17, at 11:27, John Goodwin wrote:

[...]
I'd be tempted to generalise or just remove the domain/range
restrictions. Any thoughts?



There are lots of uses for rand and domain.

One is in the user interface -- if you for example link a a person  
and a document, the system
can prompt you for a relationship which will include "is author of"  
and "made" but won't include foaf:knows or is issue of.


Similarly, when making a friend, one can us autocompletion on labels  
which the current session knows about and simplify it by for example  
removing all documents from a list of candidate foaf:knows friends.


Both these use cases require some OWL to say that documents aren't  
people. I don't see these scenarios being feasible in the general  
case because you'd need a complete description of the world in OWL,  
i.e. you'd want to know about everything that can't possibly be a  
person.


But this is true for OWL in general anyway. If you really need to  
tighten down your type (a la the RDBMS world) then you do need to  
describe things as disjoint (either explicitly, or through other  
mechanisms such as the subclass of a complement).


Using OWL still has a lot of utility, but if you absolutely require  
this level of description, then perhaps a relational database would be  
more appropriate for the application at hand. (The right tool for the  
right job, and all that)  :-)





It is of course also important for checking hand-written files for  
validity.


Again, isn't validity checking something that can only be done with  
OWL. RDFS only adds for information.


Agreed.

The most common example I see of this is people thinking that if I say  
that ns:name has a domain of ns:Person, then it will be an error to  
give a ns:name to a ns:Dog. Instead, it just means that Fido becomes  
both a ns:Person and a ns:Dog (and I'm sure that you, like me, have  
had to explain why a reasoner has not reported this as an error). This  
appears to be one of the reasons why many applications choose separate  
URIs for predicates when applied to different types (e.g. ns:Person/ 
name and ns:Dog/name), thereby sidestepping many of the issues.


While I'm here, I also noticed Tim Finin referring to "domain and  
range constraints". Personally, I don't see the word "constraint" as  
an appropriate description, since rdfs:domain and rdfs:range are not  
constraining in any way.


Regards,
Paul Gearon

Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread Tim Finin


This is an interesting discussion.  By coincidence, yesterday Tom
Briggs [1] defended his dissertation [2] on 'Constraint Generation and
Reasoning in OWL' which was done with Professor Yun Peng [3].  He
started with an analysis of Swoogle's data that showed that 75% of
published Semantic Web properties have neither domain or range
constraints and evaluated algorithms for inferring them.  Rather than
focusing on instance data, he looked at what could be learned from how
the properties were used in the TBOX, e.g., for specifying role
restrictions.  He has a paper on this that he has submitted to a
conference and should finish revising his dissertation in the next few
weeks.  Here is the abstract for his defense:

 Constraint Generation and Reasoning in OWL
 Thomas H. Briggs

 The majority of OWL ontologies in the emerging Semantic Web are
 constructed from properties that lack domain and range
 constraints. Constraints in OWL are different from the familiar uses
 in programming languages and databases, and are actually type
 assertions that are made about the individuals which are connected
 by the property. These assertions can add vital information to the
 model because they are assertions of type on the individuals
 involved, and they can also give information on how the defining
 property may be used.

 Three different automated generation techniques are explored in this
 research: disjunction, least-common named subsumer, and
 vivification. Each algorithm is compared for the ability to
 generalize, and the performance impacts with respect to the
 reasoner. A large sample of ontologies from the Swoogle repository
 are used to compare real-world performance of these techniques.

 Finally, using generated facts, a type of default reasoning, may
 conflict with future assertions to the knowledge base. While general
 default reasoning is non-monotonic and undecidable a novel approach
 is introduced to support efficient retraction of the default
 knowledge. Combined, these techniques enable a robust and efficient
 generation of domain and range constraints which will result in
 inference of additional facts and improved performance for a number
 of Semantic Web applications.

[1] http://ebiquity.umbc.edu/person/html/Tom/Briggs/
[2] http://ebiquity.umbc.edu/event/html/id/273/
[3] http://ebiquity.umbc.edu/person/html/Yun/Peng/


--
Tim Finin, Computer Science & Electrical Engineering, Univ of Maryland
Baltimore County, 1000 Hilltop Cir, Baltimore MD 21250. [EMAIL PROTECTED]
http://umbc.edu/~finin 410-455-3522 fax:-3969 http://ebiquity.umbc.edu 



Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread Jens Lehmann





Hi Chris,

Chris Bizer wrote:
> Hi Hugh and Richard,
> 
> interesting discussion indeed. 
> 
> I think that the basic idea of the Semantic Web is that you reuse existing
> terms or at least provide mappings from your terms to existing ones.
> 
> As DBpedia is often used as an interlinking hub between different datasets
> on the Web, it should in my opinion clearly have a type b) ontology using
> Richard's classification.
> 
> But what does this mean for WEB ontology languages?
> 
> Looking at the current discussion, I feel reassured that if you want to do
> WEB stuff, you should not move beyond RDFS, even aim lower and only use a
> subset of RDFS (basically only rdf:type, rdfs:subClassOf and
> rdfs:subPropertyOf) plus owl:SameAs. Anything beyond this seems to impose
> too tight restrictions, seems to be too complicated even for people with
> fair Semantic Web knowledge, and seems to break immediately when people
> start to set links between different schemata/ontologies.

I do not fully agree. First of all, let's not forget that we also have
UMBEL and YAGO as two schemata on top of DBpedia data, which do not
impose many restrictions. People are free to use those (in particular
UMBEL is designed to by type b).

Regarding you arguments:

Too tight restrictions: Which ones specifically are too tight? If the
restrictions cause inconsistencies (which they are likely to do at the
moment), then this is a signal a problem in the DBpedia data. (Which is
one of the purposes of imposing restrictions.)

Too complicated: I don't have the impression that the people writing
here have no idea about the meaning of domain and range. Even if this is
the case, no one forces them to use them.

Breaks when you set links: True, so we should be careful in setting
those links to other schemata.

> Dublin Core and FOAF went down this road. And maybe DBpedia should do the
> same (meaning to remove most range and domain restrictions and only keep the
> class and property hierarchy).
> 
> Can anybody of the ontology folks tell me convincing use cases where the
> current range and domain restrictions are useful? 

I think there are many of those. First of all, they allow checking
consistency in the DBpedia data. Having consistent data allows to
provide nice user interfaces for DBpedia. Before this release, it was
hardly possible to write a user friendly UI for DBpedia data unless you
restrict yourself to a specific part of the data. One of the other main
problems was/is querying DBpedia. A better structure also helps a lot in
formulating SPARQL queries. We had questions like "How do I query the
properties of buildings?" etc. on the mailing list. Using the domain
restrictions, you can now easily say which properties you should query
and the range allows you to see what you will get (an integer value, a
string, an instance of a certain class etc.). This probably helps to
make more sophisticated use of Semantic Web structures, then we are
doing now.

> (Validation does not count as WEB ontology languages are not designed for
> validation and XML schema should be used instead if tight validation is
> required).

As a consequence, OWL should never be used for consistency checking?

> If not, I would opt for removing the restrictions.

What is the added value in removing the restrictions?

Kind regards,

Jens


-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc





RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread John Goodwin


> Regarding you arguments:
> 
> Too tight restrictions: Which ones specifically are too 
> tight? If the restrictions cause inconsistencies (which they 
> are likely to do at the moment), then this is a signal a 
> problem in the DBpedia data. (Which is one of the purposes of 
> imposing restrictions.)

I've noticed that properties like "father" have a domain of "British
Royal or Monarch" and I wonder if this is too tight. Would you not save
yourself headaches in the future by relaxing that restriction to Person?
For example if you want to add in "father" information for US presidents
will you then have to go back and edit your OWL ontology to include US
presidents in the domain of "father". 

Furthermore, I understand disjunctions can be expensive when reasoning
(not sure if that would be the case in the Dbpedia ontology as it
doesn't use that much extra OWL). 


> I think there are many of those. First of all, they allow 
> checking consistency in the DBpedia data. Having consistent 
> data allows to provide nice user interfaces for DBpedia. 

I'm still not sure how domain and range will help check consistency.
Don't you need OWL disjoints and other information to find
inconsistencies, unless of course you check all the inferred types for
the instances? 

> As a consequence, OWL should never be used for consistency checking?

You can use it for checking satifiability of classes and consistency of
ontologies if you add enough information, but otherwise ontologies will
generally just add more information that lead to extra entailments.

John
.


This email is only intended for the person to whom it is addressed and may 
contain confidential information. If you have received this email in error, 
please notify the sender and delete this email which must not be copied, 
distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer 
and do not represent the official view of Ordnance Survey. Nor can any contract 
be formed on Ordnance Survey's behalf via email. We reserve the right to 
monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Romsey Road
Southampton SO16 4GU
Tel: 08456 050505
http://www.ordnancesurvey.co.uk




RE: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread John Goodwin


Ian Davies wrote:

> Again, isn't validity checking something that can only be done with
OWL. RDFS only adds for > information.

I think strictly speaking both OWL and RDFS only add information, but
with OWL you can at least check that the information is logically
consistent and use to validation in some sense. I think to use domain
and range to get validation you really need OWL disjoints as well. I
guess in theory you could check the inferred types for instances based
on the range/domain restriction but that could be pretty tedious for a
large abox!?

John



.


This email is only intended for the person to whom it is addressed and may 
contain confidential information. If you have received this email in error, 
please notify the sender and delete this email which must not be copied, 
distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer 
and do not represent the official view of Ordnance Survey. Nor can any contract 
be formed on Ordnance Survey's behalf via email. We reserve the right to 
monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Romsey Road
Southampton SO16 4GU
Tel: 08456 050505
http://www.ordnancesurvey.co.uk




Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-18 Thread Azamat


''Most people don't care about structure, they care about content.''

I am sorry... but that's just nonsense. There is no content (elements) 
without structure (context, elements' relationships, consolidative meaning), 
as there is no structure without content.  If your application without 
structure (unifying context), it is just an alphabet soup of data.

Azamat

- Original Message - 
From: "Georgi Kobilarov" <[EMAIL PROTECTED]>

To: "Azamat" <[EMAIL PROTECTED]>; "SW-forum" <[EMAIL PROTECTED]>
Cc: ; <[EMAIL PROTECTED]>
Sent: Tuesday, November 18, 2008 1:37 AM
Subject: RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links 
to Freebase




Ontology is designed to put all things in their natural places, not to
make mess of the real world;


Most people don't care about structure, they care about content.

DBpedia makes Wikipedia's implicit structure explicit in order to make
its content more accessible and (re)usable.

That's it.


--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Azamat
Sent: Monday, November 17, 2008 8:38 PM
To: 'SW-forum'
Cc: public-lod@w3.org; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF
links to Freebase


Monday, November 17, 2008 2:11 PM, Chris Bizer wrote:
'We are happy to announce the release of DBpedia version 3.2. ... More
information about the ontology is found at:
http://wiki.dbpedia.org/Ontology'

While opening, we see the following types of Resource, seemingly

Entity

or
Thing:

Resource (Person, Ethnic group, Organization, Infrastructure, Planet,
Work,
Event, Means of Transportation, Anatomic structure, Olympic record,
Language, Chemical compound, Species, Weapon, Protein, Disease,

Supreme

Court of the US, Grape, Website, Music Genre, Currency, Beverage,
Place).

I am of opinion to support the developers even when they misdirect.

But

this
'classification' meant to be used for 'wikipedia's infobox-to-ontology
mappings' is a complete disorder, having a chance for the URL
http://wiki.dbpedia.org/Mess.
Ontology is designed to put all things in their natural places, not to
make
mess of the real world; if you deal with chemical compound and

protein,

it
requests an arrangement like as protein < macromolecule < organic
compound <
chemical compound < matter, substance < physical entity < entity. The
same
with other things, however hard, rocky and trying it may be.

This test and trial proves again that any web ontology language
projects,
programming applications or semantic systems, are foredoomed without
fundamental ontological schema.

azamat abdoullaev

- Original Message -
From: "Chris Bizer" <[EMAIL PROTECTED]>
To: ; "'Semantic Web'" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Monday, November 17, 2008 2:11 PM
Subject: ANN: DBpedia 3.2 release, including DBpedia Ontology and RDF
links
to Freebase



Hi all,

we are happy to announce the release of DBpedia version 3.2.

The new knowledge base has been extracted from the October 2008
Wikipedia
dumps. Compared to the last release, the new knowledge base provides
three
mayor improvements:


1. DBpedia Ontology

DBpedia now features a shallow, cross-domain ontology, which has been
manually created based on the most commonly used infoboxes within
Wikipedia.
The ontology currently covers over 170 classes which form a

subsumption

hierarchy and have 940 properties. The ontology is instanciated by a
new
infobox data extraction method which is based on hand-generated
mappings of
Wikipedia infoboxes to the DBpedia ontology. The mappings define
fine-granular rules on how to parse infobox values. The mappings also
adjust
weaknesses in the Wikipedia infobox system, like having different
infoboxes
for the same class (currently 350 Wikipedia templates are mapped to

170

ontology classes), using different property names for the same

property

(currently 2350 Wikipedia template properties are mapped to 940
ontology
properties), and not having clearly defined datatypes for property
values.
Therefore, the instance data within the infobox ontology is much
cleaner and
better structured than the infobox data within the DBpedia infobox
dataset
that is generated using the old infobox extraction code. The DBpedia
ontology currently contains about 882.000 instances.

More information about the ontology is found at:
http://wiki.dbpedia.org/Ontology


2. RDF Links to Freebase

Freebase is an open-license database which provides data about million
of
things from various domains. Freebase has recently released an Linked
Data
interface to their content. As there is a big overlap between DBpedia
and
Freeb

Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Ian Davis
On Tue, Nov 18, 2008 at 4:02 AM, Tim Berners-Lee <[EMAIL PROTECTED]> wrote:

>
> On 2008-11 -17, at 11:27, John Goodwin wrote:
>
> [...]
> I'd be tempted to generalise or just remove the domain/range
> restrictions. Any thoughts?
>
>
> There are lots of uses for rand and domain.
> One is in the user interface -- if you for example link a a person and a
> document, the system
> can prompt you for a relationship which will include "is author of" and
> "made" but won't include foaf:knows or is issue of.
>
> Similarly, when making a friend, one can us autocompletion on labels which
> the current session knows about and simplify it by for example removing all
> documents from a list of candidate foaf:knows friends.
>

Both these use cases require some OWL to say that documents aren't people. I
don't see these scenarios being feasible in the general case because you'd
need a complete description of the world in OWL, i.e. you'd want to know
about everything that can't possibly be a person.



>
> It is of course also important for checking hand-written files for
> validity.
>

Again, isn't validity checking something that can only be done with OWL.
RDFS only adds for information.



>
> Tim BL
>

Ian


Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Peter Ansell
2008/11/18 Tim Berners-Lee <[EMAIL PROTECTED]>

>
> On 2008-11 -17, at 11:27, John Goodwin wrote:
>
> [...]
> I'd be tempted to generalise or just remove the domain/range
> restrictions. Any thoughts?
>
>
> There are lots of uses for rand and domain.
> One is in the user interface -- if you for example link a a person and a
> document, the system
> can prompt you for a relationship which will include "is author of" and
> "made" but won't include foaf:knows or is issue of.
>
> Similarly, when making a friend, one can us autocompletion on labels which
> the current session knows about and simplify it by for example removing all
> documents from a list of candidate foaf:knows friends.
>
> It is of course also important for checking hand-written files for
> validity.
>
> Tim BL
>

I think there are uses as you demonstrate, but checking hand written
(effectively) Wikipedia extracts for validity is practically impossible. It
may happen that assuming the dbpedia range and domain are correct that you
might not have access to something you would otherwise have correctly been
able to see in a given situation. Maybe the best idea is to leave the range
and domain in dbpedia as naive possibilities and people who are worried that
they are not going to be correct enough for the level of risk in their
application can ignore them for the dbpedia data set.

Cheers,

Peter


Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Tim Berners-Lee


On 2008-11 -17, at 11:27, John Goodwin wrote:

[...]
I'd be tempted to generalise or just remove the domain/range
restrictions. Any thoughts?



There are lots of uses for rand and domain.

One is in the user interface -- if you for example link a a person and  
a document, the system
can prompt you for a relationship which will include "is author of"  
and "made" but won't include foaf:knows or is issue of.


Similarly, when making a friend, one can us autocompletion on labels  
which the current session knows about and simplify it by for example  
removing all documents from a list of candidate foaf:knows friends.


It is of course also important for checking hand-written files for  
validity.


Tim BL

RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Georgi Kobilarov

> Ontology is designed to put all things in their natural places, not to
> make mess of the real world;

Most people don't care about structure, they care about content.

DBpedia makes Wikipedia's implicit structure explicit in order to make
its content more accessible and (re)usable.

That's it.


--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of Azamat
> Sent: Monday, November 17, 2008 8:38 PM
> To: 'SW-forum'
> Cc: public-lod@w3.org; [EMAIL PROTECTED];
> [EMAIL PROTECTED]
> Subject: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF
> links to Freebase
> 
> 
> Monday, November 17, 2008 2:11 PM, Chris Bizer wrote:
> 'We are happy to announce the release of DBpedia version 3.2. ... More
> information about the ontology is found at:
> http://wiki.dbpedia.org/Ontology'
> 
> While opening, we see the following types of Resource, seemingly
Entity
> or
> Thing:
> 
> Resource (Person, Ethnic group, Organization, Infrastructure, Planet,
> Work,
> Event, Means of Transportation, Anatomic structure, Olympic record,
> Language, Chemical compound, Species, Weapon, Protein, Disease,
Supreme
> Court of the US, Grape, Website, Music Genre, Currency, Beverage,
> Place).
> 
> I am of opinion to support the developers even when they misdirect.
But
> this
> 'classification' meant to be used for 'wikipedia's infobox-to-ontology
> mappings' is a complete disorder, having a chance for the URL
> http://wiki.dbpedia.org/Mess.
> Ontology is designed to put all things in their natural places, not to
> make
> mess of the real world; if you deal with chemical compound and
protein,
> it
> requests an arrangement like as protein < macromolecule < organic
> compound <
> chemical compound < matter, substance < physical entity < entity. The
> same
> with other things, however hard, rocky and trying it may be.
> 
> This test and trial proves again that any web ontology language
> projects,
> programming applications or semantic systems, are foredoomed without
> fundamental ontological schema.
> 
> azamat abdoullaev
> 
> - Original Message -
> From: "Chris Bizer" <[EMAIL PROTECTED]>
> To: ; "'Semantic Web'" <[EMAIL PROTECTED]>;
> <[EMAIL PROTECTED]>;
> <[EMAIL PROTECTED]>
> Sent: Monday, November 17, 2008 2:11 PM
> Subject: ANN: DBpedia 3.2 release, including DBpedia Ontology and RDF
> links
> to Freebase
> 
> 
> 
> Hi all,
> 
> we are happy to announce the release of DBpedia version 3.2.
> 
> The new knowledge base has been extracted from the October 2008
> Wikipedia
> dumps. Compared to the last release, the new knowledge base provides
> three
> mayor improvements:
> 
> 
> 1. DBpedia Ontology
> 
> DBpedia now features a shallow, cross-domain ontology, which has been
> manually created based on the most commonly used infoboxes within
> Wikipedia.
> The ontology currently covers over 170 classes which form a
subsumption
> hierarchy and have 940 properties. The ontology is instanciated by a
> new
> infobox data extraction method which is based on hand-generated
> mappings of
> Wikipedia infoboxes to the DBpedia ontology. The mappings define
> fine-granular rules on how to parse infobox values. The mappings also
> adjust
> weaknesses in the Wikipedia infobox system, like having different
> infoboxes
> for the same class (currently 350 Wikipedia templates are mapped to
170
> ontology classes), using different property names for the same
property
> (currently 2350 Wikipedia template properties are mapped to 940
> ontology
> properties), and not having clearly defined datatypes for property
> values.
> Therefore, the instance data within the infobox ontology is much
> cleaner and
> better structured than the infobox data within the DBpedia infobox
> dataset
> that is generated using the old infobox extraction code. The DBpedia
> ontology currently contains about 882.000 instances.
> 
> More information about the ontology is found at:
> http://wiki.dbpedia.org/Ontology
> 
> 
> 2. RDF Links to Freebase
> 
> Freebase is an open-license database which provides data about million
> of
> things from various domains. Freebase has recently released an Linked
> Data
> interface to their content. As there is a big overlap between DBpedia
> and
> Freebase, we have added 2.4 million RDF links to DBpedia pointing at
> the
> corresponding things in Freebase. These links can be used to smush and
> fuse
> data about a thing from DBpedia and Fre

RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Chris Wallace

I second Hugh and Richard's point.

I think the job the DBPedia people are doing in trying to corral Wikipedia into 
order is an outstanding contribution. And it's obviously hard. Uniting the 
various forms of birth date and birth place for example really increases the 
value of the dataset.  And there's more to be done of course. I use the 
foaf:depiction property  in the DBpedia category-based picture browser I've 
been writing but many such images are not in line with the meaning of 
foaf:depiction.  Rather than being depictions of the subject itself, for a 
place they may be maps of the location or the  coat of arms; for an artist it 
might be an image made by the person rather than of the person. It would be 
great (at least for my picture book!) if these different meanings could be 
separated using contextual data into distinct properties. I think it far more 
important to reach for accuracy in the data than to worry about alignment to 
pre-existing ontologies developed for other purposes.

Chris Wallace

http://www.cems.uwe.ac.uk/xmlwiki/RDF/classbrowse.xq?resource=Bristol


-Original Message-
From: [EMAIL PROTECTED] on behalf of Hugh Glaser
Sent: Mon 17/11/2008 10:33 PM
To: Richard Cyganiak
Cc: public-lod@w3.org; Semantic Web; [EMAIL PROTECTED]
Subject: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links   to 
Freebase
 

Very nicely put, Richard.
We are opening up the discussion here of when to define one's own and when to 
(re-)use from elsewhere.
I am a bit uncomfortable with the idea of "you should use a:b from c and d:e 
from f and g:h from i..."
It makes for a fragmented view of my data, and might encourage me to use things 
that do not capture exactly what I mean, as well as introducing dependencies 
with things that might change, but over which I have no control.
So far better to use ontologies of type (b) where appropriate, and define my 
own of type (a), which will (hopefully) be nicely constructed, and easier to 
understand as smallish artefacts that can be looked at as a whole.
Of course, this means we need to crack the infrastructure that does dynamic 
ontology mapping, etc.
Mind you, unless we have the need, we are less likely to do so.
I also think that the comments about the restrictions being a characteristic of 
the dataset for type (a), but more like comments on the world for type (b) are 
pretty good.
Hugh

On 17/11/2008 20:09, "Richard Cyganiak" <[EMAIL PROTECTED]> wrote:



John,

Here's an observation from a bystander ...

On 17 Nov 2008, at 17:17, John Goodwin wrote:

> This is also a good example of where (IMHO) the domain was perhaps
> over specified. For example all sorts of things could have
> publishers, and not the ones listed here. I worry that if you reuse
> DBpedia "publisher" elsewhere you could get some undesired inferences.

But are the DBpedia classes *intended* for re-use elsewhere? Or do
they simply express restrictions that apply *within DBpedia*?

I think that in general it is useful to distinguish between two
different kinds of ontologies:

a) Ontologies that express restrictions that are present in a certain
dataset. They simply express what's there in the data. In this sense,
they are like database schemas: If "Publisher" has a range of
"Person", then it means that the publisher *in this particular
dataset* is always a person. That's not an assertion about the world,
it's an assertion about the dataset. These ontologies are usually not
very re-usable.

b) Ontologies that are intended as a "lingua franca" for data exchange
between different applications. They are designed for broad re-use,
and thus usually do not add many restrictions. In this sense, they are
more like controlled vocabularies of terms. Dublin Core is probably
the prototypical example, and FOAF is another good one. They usually
don't allow as many interesting inferences.

I think that these two kinds of ontologies have very different
requirements. Ontologies that are designed for one of these roles are
quite useless if used for the other job. Ontologies that have not been
designed for either of these two roles usually fail at both.

Returning to DBpedia, my impression is that the DBpedia ontology is
intended mostly for the first role. Maybe it should be understood more
as a schema for the DBpedia dataset, and not so much as a re-usable
set of terms for use outside of the Wikipedia context. (I might be
wrong, I was not involved in its creation.)

Richard




This incoming email to UWE has been independently scanned for viruses by McAfee 
anti-virus software and none were detected



This email was independently scanned for viruses by McAfee anti-virus software 
and none were found


Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Hugh Glaser

Very nicely put, Richard.
We are opening up the discussion here of when to define one's own and when to 
(re-)use from elsewhere.
I am a bit uncomfortable with the idea of "you should use a:b from c and d:e 
from f and g:h from i..."
It makes for a fragmented view of my data, and might encourage me to use things 
that do not capture exactly what I mean, as well as introducing dependencies 
with things that might change, but over which I have no control.
So far better to use ontologies of type (b) where appropriate, and define my 
own of type (a), which will (hopefully) be nicely constructed, and easier to 
understand as smallish artefacts that can be looked at as a whole.
Of course, this means we need to crack the infrastructure that does dynamic 
ontology mapping, etc.
Mind you, unless we have the need, we are less likely to do so.
I also think that the comments about the restrictions being a characteristic of 
the dataset for type (a), but more like comments on the world for type (b) are 
pretty good.
Hugh

On 17/11/2008 20:09, "Richard Cyganiak" <[EMAIL PROTECTED]> wrote:



John,

Here's an observation from a bystander ...

On 17 Nov 2008, at 17:17, John Goodwin wrote:

> This is also a good example of where (IMHO) the domain was perhaps
> over specified. For example all sorts of things could have
> publishers, and not the ones listed here. I worry that if you reuse
> DBpedia "publisher" elsewhere you could get some undesired inferences.

But are the DBpedia classes *intended* for re-use elsewhere? Or do
they simply express restrictions that apply *within DBpedia*?

I think that in general it is useful to distinguish between two
different kinds of ontologies:

a) Ontologies that express restrictions that are present in a certain
dataset. They simply express what's there in the data. In this sense,
they are like database schemas: If "Publisher" has a range of
"Person", then it means that the publisher *in this particular
dataset* is always a person. That's not an assertion about the world,
it's an assertion about the dataset. These ontologies are usually not
very re-usable.

b) Ontologies that are intended as a "lingua franca" for data exchange
between different applications. They are designed for broad re-use,
and thus usually do not add many restrictions. In this sense, they are
more like controlled vocabularies of terms. Dublin Core is probably
the prototypical example, and FOAF is another good one. They usually
don't allow as many interesting inferences.

I think that these two kinds of ontologies have very different
requirements. Ontologies that are designed for one of these roles are
quite useless if used for the other job. Ontologies that have not been
designed for either of these two roles usually fail at both.

Returning to DBpedia, my impression is that the DBpedia ontology is
intended mostly for the first role. Maybe it should be understood more
as a schema for the DBpedia dataset, and not so much as a re-usable
set of terms for use outside of the Wikipedia context. (I might be
wrong, I was not involved in its creation.)

Richard




Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Frank Manola


On Nov 17, 2008, at 2:46 PM, Dan Brickley wrote:



Azamat wrote:

Monday, November 17, 2008 2:11 PM, Chris Bizer wrote:
'We are happy to announce the release of DBpedia version 3.2. ...  
More information about the ontology is found at: http://wiki.dbpedia.org/Ontology'
While opening, we see the following types of Resource, seemingly  
Entity or Thing:
Resource (Person, Ethnic group, Organization, Infrastructure,  
Planet, Work, Event, Means of Transportation, Anatomic structure,  
Olympic record, Language, Chemical compound, Species, Weapon,  
Protein, Disease, Supreme Court of the US, Grape, Website, Music  
Genre, Currency, Beverage, Place).
I am of opinion to support the developers even when they misdirect.  
But this 'classification' meant to be used for 'wikipedia's infobox- 
to-ontology mappings' is a complete disorder, having a chance for  
the URL http://wiki.dbpedia.org/Mess.
Ontology is designed to put all things in their natural places, not  
to make mess of the real world; if you deal with chemical compound  
and protein, it requests an arrangement like as protein <  
macromolecule < organic compound < chemical compound < matter,  
substance < physical entity < entity. The same with other things,  
however hard, rocky and trying it may be.
This test and trial proves again that any web ontology language  
projects, programming applications or semantic systems, are  
foredoomed without fundamental ontological schema.


Is Wikipedia foredoomed also?

Dan


It may very well be, if your ontological commitment is that all things  
have "natural places", and the real world is not actually a mess.   
However, at least for the kind of ontology being discussed here, it  
seems to me that the ontology may not be so much *making* a mess of  
the real world as reflecting it.


--Frank






azamat abdoullaev








Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Kingsley Idehen


Juan Sequeda wrote:

As anybody considered reusing the DBpedia ontology?

Juan Sequeda, Ph.D Student

Research Assistant
Dept. of Computer Sciences
The University of Texas at Austin
http://www.cs.utexas.edu/~jsequeda 
[EMAIL PROTECTED] 

http://www.juansequeda.com/

Semantic Web in Austin: http://juansequeda.blogspot.com/


On Mon, Nov 17, 2008 at 2:09 PM, Richard Cyganiak <[EMAIL PROTECTED] 
> wrote:



John,

Here's an observation from a bystander ...

On 17 Nov 2008, at 17:17, John Goodwin wrote:


This is also a good example of where (IMHO) the domain was
perhaps over specified. For example all sorts of things could
have publishers, and not the ones listed here. I worry that if
you reuse DBpedia "publisher" elsewhere you could get some
undesired inferences.


But are the DBpedia classes *intended* for re-use elsewhere? Or do
they simply express restrictions that apply *within DBpedia*?

I think that in general it is useful to distinguish between two
different kinds of ontologies:

a) Ontologies that express restrictions that are present in a
certain dataset. They simply express what's there in the data. In
this sense, they are like database schemas: If "Publisher" has a
range of "Person", then it means that the publisher *in this
particular dataset* is always a person. That's not an assertion
about the world, it's an assertion about the dataset. These
ontologies are usually not very re-usable.

b) Ontologies that are intended as a "lingua franca" for data
exchange between different applications. They are designed for
broad re-use, and thus usually do not add many restrictions. In
this sense, they are more like controlled vocabularies of terms.
Dublin Core is probably the prototypical example, and FOAF is
another good one. They usually don't allow as many interesting
inferences.

I think that these two kinds of ontologies have very different
requirements. Ontologies that are designed for one of these roles
are quite useless if used for the other job. Ontologies that have
not been designed for either of these two roles usually fail at both.


Richard,



Returning to DBpedia, my impression is that the DBpedia ontology
is intended mostly for the first role. Maybe it should be
understood more as a schema for the DBpedia dataset, and not so
much as a re-usable set of terms for use outside of the Wikipedia
context. (I might be wrong, I was not involved in its creation.)

In a nutshell, YES!  This is much much clearer and less problematic than 
the generic "ontology" moniker.


DBpedia colleagues: I think we should qualify what currently exists as a 
Schema or Data Dictionary for the DBpedia data set (or Data Space) :-)


Kingsley



Richard





--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com








Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Juan Sequeda
As anybody considered reusing the DBpedia ontology?

Juan Sequeda, Ph.D Student

Research Assistant
Dept. of Computer Sciences
The University of Texas at Austin
http://www.cs.utexas.edu/~jsequeda
[EMAIL PROTECTED]

http://www.juansequeda.com/

Semantic Web in Austin: http://juansequeda.blogspot.com/


On Mon, Nov 17, 2008 at 2:09 PM, Richard Cyganiak <[EMAIL PROTECTED]>wrote:

>
> John,
>
> Here's an observation from a bystander ...
>
> On 17 Nov 2008, at 17:17, John Goodwin wrote:
> 
>
>> This is also a good example of where (IMHO) the domain was perhaps over
>> specified. For example all sorts of things could have publishers, and not
>> the ones listed here. I worry that if you reuse DBpedia "publisher"
>> elsewhere you could get some undesired inferences.
>>
>
> But are the DBpedia classes *intended* for re-use elsewhere? Or do they
> simply express restrictions that apply *within DBpedia*?
>
> I think that in general it is useful to distinguish between two different
> kinds of ontologies:
>
> a) Ontologies that express restrictions that are present in a certain
> dataset. They simply express what's there in the data. In this sense, they
> are like database schemas: If "Publisher" has a range of "Person", then it
> means that the publisher *in this particular dataset* is always a person.
> That's not an assertion about the world, it's an assertion about the
> dataset. These ontologies are usually not very re-usable.
>
> b) Ontologies that are intended as a "lingua franca" for data exchange
> between different applications. They are designed for broad re-use, and thus
> usually do not add many restrictions. In this sense, they are more like
> controlled vocabularies of terms. Dublin Core is probably the prototypical
> example, and FOAF is another good one. They usually don't allow as many
> interesting inferences.
>
> I think that these two kinds of ontologies have very different
> requirements. Ontologies that are designed for one of these roles are quite
> useless if used for the other job. Ontologies that have not been designed
> for either of these two roles usually fail at both.
>
> Returning to DBpedia, my impression is that the DBpedia ontology is
> intended mostly for the first role. Maybe it should be understood more as a
> schema for the DBpedia dataset, and not so much as a re-usable set of terms
> for use outside of the Wikipedia context. (I might be wrong, I was not
> involved in its creation.)
>
> Richard
>


Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Richard Cyganiak


John,

Here's an observation from a bystander ...

On 17 Nov 2008, at 17:17, John Goodwin wrote:

This is also a good example of where (IMHO) the domain was perhaps  
over specified. For example all sorts of things could have  
publishers, and not the ones listed here. I worry that if you reuse  
DBpedia "publisher" elsewhere you could get some undesired inferences.


But are the DBpedia classes *intended* for re-use elsewhere? Or do  
they simply express restrictions that apply *within DBpedia*?


I think that in general it is useful to distinguish between two  
different kinds of ontologies:


a) Ontologies that express restrictions that are present in a certain  
dataset. They simply express what's there in the data. In this sense,  
they are like database schemas: If “Publisher” has a range of  
“Person”, then it means that the publisher *in this particular  
dataset* is always a person. That's not an assertion about the world,  
it's an assertion about the dataset. These ontologies are usually not  
very re-usable.


b) Ontologies that are intended as a “lingua franca” for data exchange  
between different applications. They are designed for broad re-use,  
and thus usually do not add many restrictions. In this sense, they are  
more like controlled vocabularies of terms. Dublin Core is probably  
the prototypical example, and FOAF is another good one. They usually  
don't allow as many interesting inferences.


I think that these two kinds of ontologies have very different  
requirements. Ontologies that are designed for one of these roles are  
quite useless if used for the other job. Ontologies that have not been  
designed for either of these two roles usually fail at both.


Returning to DBpedia, my impression is that the DBpedia ontology is  
intended mostly for the first role. Maybe it should be understood more  
as a schema for the DBpedia dataset, and not so much as a re-usable  
set of terms for use outside of the Wikipedia context. (I might be  
wrong, I was not involved in its creation.)


Richard


Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Dan Brickley


Azamat wrote:


Monday, November 17, 2008 2:11 PM, Chris Bizer wrote:
'We are happy to announce the release of DBpedia version 3.2. ... More 
information about the ontology is found at: 
http://wiki.dbpedia.org/Ontology'


While opening, we see the following types of Resource, seemingly Entity 
or Thing:


Resource (Person, Ethnic group, Organization, Infrastructure, Planet, 
Work, Event, Means of Transportation, Anatomic structure, Olympic 
record, Language, Chemical compound, Species, Weapon, Protein, Disease, 
Supreme Court of the US, Grape, Website, Music Genre, Currency, 
Beverage, Place).


I am of opinion to support the developers even when they misdirect. But 
this 'classification' meant to be used for 'wikipedia's 
infobox-to-ontology mappings' is a complete disorder, having a chance 
for the URL http://wiki.dbpedia.org/Mess.
Ontology is designed to put all things in their natural places, not to 
make mess of the real world; if you deal with chemical compound and 
protein, it requests an arrangement like as protein < macromolecule < 
organic compound < chemical compound < matter, substance < physical 
entity < entity. The same with other things, however hard, rocky and 
trying it may be.


This test and trial proves again that any web ontology language 
projects, programming applications or semantic systems, are foredoomed 
without fundamental ontological schema.


Is Wikipedia foredoomed also?

Dan


azamat abdoullaev





Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Azamat


Monday, November 17, 2008 2:11 PM, Chris Bizer wrote:
'We are happy to announce the release of DBpedia version 3.2. ... More 
information about the ontology is found at: 
http://wiki.dbpedia.org/Ontology'


While opening, we see the following types of Resource, seemingly Entity or 
Thing:


Resource (Person, Ethnic group, Organization, Infrastructure, Planet, Work, 
Event, Means of Transportation, Anatomic structure, Olympic record, 
Language, Chemical compound, Species, Weapon, Protein, Disease, Supreme 
Court of the US, Grape, Website, Music Genre, Currency, Beverage, Place).


I am of opinion to support the developers even when they misdirect. But this 
'classification' meant to be used for 'wikipedia's infobox-to-ontology 
mappings' is a complete disorder, having a chance for the URL 
http://wiki.dbpedia.org/Mess.
Ontology is designed to put all things in their natural places, not to make 
mess of the real world; if you deal with chemical compound and protein, it 
requests an arrangement like as protein < macromolecule < organic compound < 
chemical compound < matter, substance < physical entity < entity. The same 
with other things, however hard, rocky and trying it may be.


This test and trial proves again that any web ontology language projects, 
programming applications or semantic systems, are foredoomed without 
fundamental ontological schema.


azamat abdoullaev

- Original Message - 
From: "Chris Bizer" <[EMAIL PROTECTED]>
To: ; "'Semantic Web'" <[EMAIL PROTECTED]>; 
<[EMAIL PROTECTED]>; 
<[EMAIL PROTECTED]>

Sent: Monday, November 17, 2008 2:11 PM
Subject: ANN: DBpedia 3.2 release, including DBpedia Ontology and RDF links 
to Freebase




Hi all,

we are happy to announce the release of DBpedia version 3.2.

The new knowledge base has been extracted from the October 2008 Wikipedia
dumps. Compared to the last release, the new knowledge base provides three
mayor improvements:


1. DBpedia Ontology

DBpedia now features a shallow, cross-domain ontology, which has been
manually created based on the most commonly used infoboxes within Wikipedia.
The ontology currently covers over 170 classes which form a subsumption
hierarchy and have 940 properties. The ontology is instanciated by a new
infobox data extraction method which is based on hand-generated mappings of
Wikipedia infoboxes to the DBpedia ontology. The mappings define
fine-granular rules on how to parse infobox values. The mappings also adjust
weaknesses in the Wikipedia infobox system, like having different infoboxes
for the same class (currently 350 Wikipedia templates are mapped to 170
ontology classes), using different property names for the same property
(currently 2350 Wikipedia template properties are mapped to 940 ontology
properties), and not having clearly defined datatypes for property values.
Therefore, the instance data within the infobox ontology is much cleaner and
better structured than the infobox data within the DBpedia infobox dataset
that is generated using the old infobox extraction code. The DBpedia
ontology currently contains about 882.000 instances.

More information about the ontology is found at:
http://wiki.dbpedia.org/Ontology


2. RDF Links to Freebase

Freebase is an open-license database which provides data about million of
things from various domains. Freebase has recently released an Linked Data
interface to their content. As there is a big overlap between DBpedia and
Freebase, we have added 2.4 million RDF links to DBpedia pointing at the
corresponding things in Freebase. These links can be used to smush and fuse
data about a thing from DBpedia and Freebase.

For more information about the Freebase links see:
http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-
links-to-opencyc-updated/


3. Cleaner Abstacts

Within the old DBpedia dataset it occurred that the abstracts for different
languages contained Wikpedia markup and other strange characters. For the
3.2 release, we have improved DBpedia's abstract extraction code which
results in much cleaner abstracts that can safely be displayed in user
interfaces.


The new DBpedia release can be downloaded from:

http://wiki.dbpedia.org/Downloads32

and is also available via the DBpedia SPARQL endpoint at

http://dbpedia.org/sparql

and via DBpedia's Linked Data interface. Example URIs:

http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Oliver_Stone

More information about DBpedia in general is found at:

http://wiki.dbpedia.org/About


Lots of thanks to everybody who contributed to the Dbpedia 3.2 release!

Especially:

1. Georgi Kobilarov (Freie Universität Berlin) who designed and implemented
the new infobox extraction framework.
2. Anja Jentsch (Freie Universität Berlin) who contributed to implementing
the new extraction framework and wrote the infobox to ontology class
mappings.
3. Paul Kreis (Freie Universität Berlin) who improved the datatype
extraction code.
4. Andreas Schultz (Freie Universität Berlin) for gene

Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Jens Lehmann

John Goodwin wrote:
> 
>>
>> The semantics of range mean that you have essentially asserted that  
>> the range of published is [Person and Company]. If you want the  
>> union, you'll have to explicitly use the unionOf constructor here.
> 
> Thanks Sean, yup that's the one. There were a few other cases of that 
> elsewhere as well. 

The range issue is (hopefully) fixed now.

Kind regards,

Jens



-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc



Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Jens Lehmann


Hello,

John Goodwin wrote:
> 
>> John's comment relates to (at least) the axioms on "publisher":
>> 
>> The semantics of range mean that you have essentially asserted that
>>  the range of published is [Person and Company]. If you want the 
>> union, you'll have to explicitly use the unionOf constructor here.
> 
> Thanks Sean, yup that's the one. There were a few other cases of that
> elsewhere as well. 

That's indeed a bug, which we need to fix (should have come out as union
instead). You made it kind of hard for me to understand your first post
correctly. ;-)

> This is also a good example of where (IMHO) the
> domain was perhaps over specified. For example all sorts of things
> could have publishers, and not the ones listed here. I worry that if
> you reuse DBpedia "publisher" elsewhere you could get some undesired
> inferences.

In the future, there will be a user interface for specifying
domains/ranges. (Georgi is working on it.) We hope that the quality of
the schema will increase over time.

Kind regards,

Jens

-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc



RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread John Goodwin



> John's comment relates to (at least) the axioms on "publisher":
>
> [[
> http://dbpedia.org/ontology/publisher";>
> publisher
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   http://dbpedia.org/ontology/Person"/>
>   http://dbpedia.org/ontology/Company"/>
> 
> ]]
>
> The semantics of range mean that you have essentially asserted that  
> the range of published is [Person and Company]. If you want the  
> union, you'll have to explicitly use the unionOf constructor here.

Thanks Sean, yup that's the one. There were a few other cases of that elsewhere 
as well. This is also a good example of where (IMHO) the domain was perhaps 
over specified. For example all sorts of things could have publishers, and not 
the ones listed here. I worry that if you reuse DBpedia "publisher" elsewhere 
you could get some undesired inferences.

cheers,

John
.


This email is only intended for the person to whom it is addressed and may 
contain confidential information. If you have received this email in error, 
please notify the sender and delete this email which must not be copied, 
distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer 
and do not represent the official view of Ordnance Survey. Nor can any contract 
be formed on Ordnance Survey's behalf via email. We reserve the right to 
monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Romsey Road
Southampton SO16 4GU
Tel: 08456 050505
http://www.ordnancesurvey.co.uk




Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Sean Bechhofer



On 17 Nov 2008, at 17:00, Jens Lehmann wrote:




Hello John,

John Goodwin wrote:


Thanks Chris and team for all your hard work getting this done. I do,
however, have a few comments regarding the OWL ontology. I think in
general the use of domain and range is perhaps a bit "dubious" in  
that

for many things I think it is overly specified. I can imagine anyone
re-using the Dbpedia properties getting some unexpected inferences  
from
the domain and range restrictions. Also the range restriction seem  
to be

done as an OWL intersection so if, for example, something has a
publisher x then x will be inferred to be both a Company and a Person
which is probably not what you want. Personally, in all but a few  
cases,

I'd be tempted to generalise or just remove the domain/range
restrictions. Any thoughts?


We specified the domains and ranges as disjunctions of classes (not
intersection). See the W3C specification of owl:unionOf [1].


John's comment relates to (at least) the axioms on "publisher":

[[
http://dbpedia.org/ontology/publisher";>
publisher









http://dbpedia.org/ontology/Person"/>
http://dbpedia.org/ontology/Company"/>

]]

The semantics of range mean that you have essentially asserted that  
the range of published is [Person and Company]. If you want the  
union, you'll have to explicitly use the unionOf constructor here.


Cheers,

Sean

--
Sean Bechhofer
School of Computer Science
University of Manchester
[EMAIL PROTECTED]
http://www.cs.manchester.ac.uk/people/bechhofer






RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread John Goodwin

Hi Jens,

> We specified the domains and ranges as disjunctions of classes (not
> intersection). See the W3C specification of owl:unionOf [1].

The version I downloaded from http://wiki.dbpedia.org/Downloads32 had all the 
range restrictions as owl:intersectionOf. Or rather properties like "publisher" 
had lists of classes as the range, and OWL editors/reasoners will interpret 
that as intersection and not union.

> The domain and range axioms help to structure DBpedia and clarify the
> meaning of certain properties. While there is room for improvement, it
> is not an option to remove all of them.

I think some of them are are definitely worth keeping, especially where related 
to people and places, but I thought a few were a bit overspecified. 

Anyway keep up the good work...:)

John


.


This email is only intended for the person to whom it is addressed and may 
contain confidential information. If you have received this email in error, 
please notify the sender and delete this email which must not be copied, 
distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer 
and do not represent the official view of Ordnance Survey. Nor can any contract 
be formed on Ordnance Survey's behalf via email. We reserve the right to 
monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Romsey Road
Southampton SO16 4GU
Tel: 08456 050505
http://www.ordnancesurvey.co.uk




Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Jens Lehmann


Hello John,

John Goodwin wrote:
> 
> Thanks Chris and team for all your hard work getting this done. I do,
> however, have a few comments regarding the OWL ontology. I think in
> general the use of domain and range is perhaps a bit "dubious" in that
> for many things I think it is overly specified. I can imagine anyone
> re-using the Dbpedia properties getting some unexpected inferences from
> the domain and range restrictions. Also the range restriction seem to be
> done as an OWL intersection so if, for example, something has a
> publisher x then x will be inferred to be both a Company and a Person
> which is probably not what you want. Personally, in all but a few cases,
> I'd be tempted to generalise or just remove the domain/range
> restrictions. Any thoughts?

We specified the domains and ranges as disjunctions of classes (not
intersection). See the W3C specification of owl:unionOf [1].

The domain and range axioms help to structure DBpedia and clarify the
meaning of certain properties. While there is room for improvement, it
is not an option to remove all of them.

Currently, there are two versions of the infobox extraction: a loose one
and a strict one. In the strict one, it is guaranteed that the data
complies to the ranges specified in the ontology schema. Currently, only
the loose (probably inconsistent) one is provided.

Kind regards,

Jens

[1] http://www.w3.org/TR/owl-guide/#owl_unionOf


-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc



Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread Kingsley Idehen


John Goodwin wrote:
  

Have fun with the new DBpedia knowledge base!

Cheers

Chris



Thanks Chris and team for all your hard work getting this done. I do,
however, have a few comments regarding the OWL ontology. I think in
general the use of domain and range is perhaps a bit "dubious" in that
for many things I think it is overly specified. I can imagine anyone
re-using the Dbpedia properties getting some unexpected inferences from
the domain and range restrictions. Also the range restriction seem to be
done as an OWL intersection so if, for example, something has a
publisher x then x will be inferred to be both a Company and a Person
which is probably not what you want. Personally, in all but a few cases,
I'd be tempted to generalise or just remove the domain/range
restrictions. Any thoughts?

john
.

  

John,

Yes, this issues exist with the DBpedia Ontology, but also bear in mind 
this is also why UMBEL and DBpedia have been Linked :-)


Eventually (as per my other mail posts), the DBpedia Ontology will need 
to be linked with UMBEL (*which isn't the case right now*) so that it 
can benefit from the work done in the UMBEL project. That said, UMBEL is 
also a loosely bound ontology for DBpedia already, so you can also 
optionally leverage for inference purposes  right now.  All in all, I am 
hoping a lot of the inherent flexibilities that the Internet hosted 
Distributed Database we know as the Web accords start to come through 
with much more clarity following DBpedia 3.2.


Kingsley

This email is only intended for the person to whom it is addressed and may 
contain confidential information. If you have received this email in error, 
please notify the sender and delete this email which must not be copied, 
distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer 
and do not represent the official view of Ordnance Survey. Nor can any contract 
be formed on Ordnance Survey's behalf via email. We reserve the right to 
monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Romsey Road
Southampton SO16 4GU
Tel: 08456 050505
http://www.ordnancesurvey.co.uk



  



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com








RE: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase

2008-11-17 Thread John Goodwin


> 
> Have fun with the new DBpedia knowledge base!
> 
> Cheers
> 
> Chris

Thanks Chris and team for all your hard work getting this done. I do,
however, have a few comments regarding the OWL ontology. I think in
general the use of domain and range is perhaps a bit "dubious" in that
for many things I think it is overly specified. I can imagine anyone
re-using the Dbpedia properties getting some unexpected inferences from
the domain and range restrictions. Also the range restriction seem to be
done as an OWL intersection so if, for example, something has a
publisher x then x will be inferred to be both a Company and a Person
which is probably not what you want. Personally, in all but a few cases,
I'd be tempted to generalise or just remove the domain/range
restrictions. Any thoughts?

john
.


This email is only intended for the person to whom it is addressed and may 
contain confidential information. If you have received this email in error, 
please notify the sender and delete this email which must not be copied, 
distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer 
and do not represent the official view of Ordnance Survey. Nor can any contract 
be formed on Ordnance Survey's behalf via email. We reserve the right to 
monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Romsey Road
Southampton SO16 4GU
Tel: 08456 050505
http://www.ordnancesurvey.co.uk