The recent discussion about URIs has made me change my perspective on the whole 
issue. It seems that it is getting much more complicated than it needs to be. 
Here is my new suggestion:

1.) Every agent/user on the Semantic Web connects to one or more RDF query 
endpoints. The HTTP addresses of these endpoints would probably be few, in the 
order of magnitude of a few dozens at most. The small collection of addresses 
of these endpoints would be easily manageable and probably quite stable.

2.) The agents conduct distributed queries over all endpoints, using SPARQL or 
the simpler FETCH or SPO query protocols [1].

3.) URIs that represent entities in RDF/OWL are completely separated from URLs 
that can be resolved via HTTP. While the URIs that represent entities have 
well-defined semantics, the resolvable URLs are just that: URLs that yield some 
binary data through a HTTP-GET command. We should not ascribe any meaning or 
identity criteria to such a URL -- it is simply a string that we can type into 
our web browser to get something back.

4.) URIs of an entity can be connected to URLs through a single property, let's 
call it 'get-at'. A typical use of this property would be

<some-picture> <get-at> <some-URL>

The only connection of <some-URL> to the rest of the RDF graph is the <get-at> 
property; it is not part of any other statement. All descriptions pertaining to 
the digital resource (e.g. metadata about the size of the picture) are made 
with <some-picture>.


Ultimately, we would have three fundamentally different kinds of URIs and 
statements: a URI for the thing itself, a URI for a digital representation of a 
thing (e.g. a picture) and a URI/URL that simply gives us a HTTP address for 
the digital representation. Here is an example:

------------
Statement about a thing:

<Eiffel-Tower> <rdf:type> <Tower>


Statements about a digital resource dealing with a thing:

<Eiffel-Tower> <depicted-in> <Eiffel-Tower-Picture>
<Eiffel-Tower-Picture> <dimensions> "800x600 pixels"


Statement about the URL of a digital resource dealing with a thing:

<Eiffel-Tower-Picture> <get-at> 
<http://www.sbac.edu/~tpl/clipart/Photos/Eiffel%20Tower.jpg>
------------

Note that the actual URIs of <Eiffel-Tower> or <Eiffel-Tower-Picture> might 
also be URIs of the http naming scheme -- no one would ever try to resolve 
them, though. Only URIs that are the object of a <subject> <get-at> <object> 
triple would ever be resolved via HTTP.

If the URLs break or are subject to change, we would need to update the RDF 
graph describing the URLs. This can be done by any of the administrators of the 
RDF query endpoints. In this scenario, the RDF query endpoints and the 
distributed SPARQL queries have made any other resolution system unnecessary. 
No need for GETting every resource we encounter, no need for content 
negotiation, no need for downloading huge RDF files just to get a few 
statements out of them, no need to pay attention of the URIs you use when 
creating a simple ontology, no need to be puzzled about the semantics of simple 
web resources and URLs, no need to express versioning information in the URI... 

Everything stays inside RDF and already standardized RDF technologies. Of 
course, distributed queries would still need some optimization, but that is a 
problem we would have to deal with anyways.

Kind regards,
Matthias Samwald


[1] http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/tutorial/netapi.html

-- 


Echte DSL-Flatrate dauerhaft für 0,- Euro*. Nur noch kurze Zeit!
"Feel free" mit GMX DSL: http://www.gmx.net/de/go/dsl

Reply via email to