Re: Best Practice for Renaming OWL Vocabulary Elements

Kingsley Idehen Thu, 19 May 2011 11:25:11 -0700

On 5/19/11 1:57 PM, Martin Hepp wrote:

Hi Kingsley:
I basically agree with all what you are saying, and I see many areas where it 
is not feasible or in the long run not scalable to use human-readable 
identifiers.


Sure, "horses for courses".

As for the education aspect of it, I am not totally convinced, because while 
improving the understanding of the mechanics of LOD, it also increases the 
cognitive entrance barrier for practitioners who do not want to be educated, 
but simply get a certain job done, and making it more complicated or more 
costly or less fun to use a new technique will reduce the number of users.

I am trying to say: by understanding the underlying concepts we end upwith the following:


1. Clearer messaging

2. Dexterous messaging -- tell the same story in different ways to aplethora of audience profiles3. Inclusive messaging -- people that grok the concepts from otherrealms just get on board and add their expertise and experiences to thepot..

All: I was not in general advocating human-readable URIs.

I know you aren't. Your point is that for your Horse you've taken aparticular Course. That's how it should be done. Also remember, as faras I am concerned, <http://productontology.org> is a great example of aLinked Data pattern that's easy to comprehend by folks outside LOD andSemWeb communities. I can easily discern:


1. HTTP URIs used as Entity Names
2. HTTP URLs that serve as Data Source Names (Addresses).

  I was just saying that they are often the best compromise for small Web 
vocabularies that aim at real-world adoption, human-readable URIs and short, 
intuitive local parts for CURIEs, and I am deeply convinced they are the best 
choice for GoodRelations.


100% fine, bar my little quibble about labels :-)

Kingsley

Best

Martin

On May 19, 2011, at 6:54 PM, Kingsley Idehen wrote:

On 5/19/11 4:39 AM, Martin Hepp wrote:

Hi all:

GoodRelations uses English keywords as concept identifiers, same as most programming languages 
except for machine code and most markup languages (e.g. "h1" in HTML, 
"property" in RDFa) use English keywords. The main reason is that as soon as the raw 
identifiers need to be handled by humans, at least sometimes, then ergonomic considerations 
recommend human-readable identifiers.

By the way, the question whether keys/identifiers in computer systems should be 
human-readable is a freshmen's topic in Information Systems,
and the textbook knowledge is that

- cryptic keys are usually shorter and it is easier to avoid collisions (in the 
sense of accidental duplicate assignment)
- human readable keys are much more productive for humans to handle.

The textbook recommendation in Information Systems is:
1. Use cryptic keys when the key is used ONLY by machines.
2. Use human-readable keys when key is, at least sometimes, used by HUMANS and 
machines.

I see the point in using cryptic identifiers for conceptual elements in very 
large vocabularies that will never be handled manually.
But for broadly used Web vocabularies, cryptic identifiers are as inadequate as 
replacing all HTML element keys by hexadecimal codes.

By the way: Keep in mind that "you can write URIs on a bus". If 
human-readability of URIs was irrelevant, why do business pay a lot of money for short or 
catchy domain names?

Yes, but to whom do you speak when using the term: URI ? It is this usage that continues 
to shape what I describe as the "elephant in the room" problem re. Linked Data 
adoption. Basically, Linked Data adoption can be a smooth and coherently continuous 
experience for Web users and developers.

I see quality Web users and developers as being comprised of the following 
profiles:

1. Browser users - open and bookmark pages primarily via URLs

2. Web 1.0 developers -- works with HTTP as data access mechanism with HTML as 
data representation format via URLs

3. Web 2.0 developers-- ditto with the addition of RESTful interaction 
(client-server) patterns with XML and/or JSON as formats for data 
representation.

To be frank - and sorry for offending people in here: If the opinions 
dominating this discussion are representative of the community, then the 
Semantic Web is bound to failure, because you seem to completely ignore

1. adoption issues, in particular ergonomics and cognitive aspects,
2. the economics of diffusion, and the
3. typical development environment of Web developers outside of research labs 
(in terms of skills and tools).

Now what's still alienated from the larger conversation (that I am seeking to 
smoke out) is this:

URLs should be hackable and human comprehensible as you already outline very 
clearly. At the same time, URIs can be synthetic and opaque in the pure sense 
too, the issue boils down to a warped Linked Data narrative that prefers a 
pattern that skews the fundamental computer science in play re. data access by 
reference.

Using Initial Entity Names and Representation Address examples from DBpedia as 
an example, the browser driven sequence that drives experience is as follows:

1. http://dbpedia.org/resource/Paris --- DBpedia Name for the Entity: Paris

2. http://dbpedia.org/page/Paris -- HTML resource that delivers a Human 
oriented description Paris (the Attribute=Value graph constructed around the 
Subject: http://dbpedia.org/resource/Paris is sorta human discernible)

3. http://dbpedia.org/data/Paris.n3 -- N3 based machine oriented description of 
Paris

4. http://dbpedia.org/data/Paris.rdf -- RDF/XML ditto

5. http://dbpedia.org/data/Paris.ntriples (will be .nt soon) -- N-Triples ditto.


Items 2-5 covers all about the Address (URL) aspect/function.  Item 1 covers 
the generic de-referencable Name aspect/function .

The problem with the pattern above is that it uses convenience to obscure the 
underlying data access by reference concept (de-reference/indirection + 
address-of) that's in play.

Most basic example (looking through end-user and Web 1.0 and 2.0 developer 
lenses): I enter: http://dbpedia.org/resource/Paris into the *Address bar* of 
any browser and right before my eyes I see:

1. http://dbpedia.org/page/Paris -- in the address bar

2. HTML document describing 'Paris'.

Immediate quandary: what do I bookmark? Remember, users bookmark resource 
addresses (URLs).

Now imagine an alternative sequence:

1. http://dbpedia.org/page/Paris -- user knows this is a page (document) about 
Paris and also knows it can be bookmarked

2. http://dbpedia.org/resource/Paris --- confined to @href which is associated with 
relevant portion of "About: Paris" text

3. http://dbpedia.org/data/Paris.n3 -- discovered by footer links (human),<link/>  (web 
1.0 or 2.0 developer), or "Link:" header responses (web 2.0 and 3.0 developers)

4. http://dbpedia.org/data/Paris.rdf -- ditto

5. http://dbpedia.org/data/Paris.ntriples (will be .nt soon) -- ditto.

The bookmark confusion matter is resolved.  Entity Name / Representation 
Address ambiguity matter (inherent to HTTP URI usage) slightly reduced.

Now here's the bigger problem. What happens when my Entity Names have to be 
synthetically generated (old school identifier style) but I seek human oriented 
resources (documents) addresses with human interaction in minde? Can I 
introspectively discern an Entity Name from its Representation accessed via an 
Address? The answer to this question is Yes, but it means you really have to 
accept that the DBpedia Linked Data URI/URL pattern isn't the gold standard, 
its just a style that works for the DBpedia usecase.

Here is another sequence, this time leveraging .well-known/host-meta (Web 
Linking) pattern:

1. http://dbpedia.org/.well-known/host-meta  -- this is a Web 2.0 developer 
step re. discovering URI template patterns for an given data space that's 
unimportant to end-users

2. http://dbpedia.org/describe/?uri=http://dbpedia.org/resource/Paris -- 
end-user oriented URL that's bookmark friendly and hack-able

3. http://dbpedia.org/resource/Paris --- confined to @href which is associated with 
relevant portion of "About: Paris" text

4. http://dbpedia.org/data/Paris.n3 -- discovered by footer links (human),<link/>  (web 
1.0 or 2.0 developer), or "Link:" header responses (web 2.0 and 3.0 developers)

5. http://dbpedia.org/data/Paris.rdf -- ditto

6. http://dbpedia.org/data/Paris.ntriples (will be .nt soon) -- ditto.

Comments:
We have URLs as the focus, bookmark pattern preserved, plus data mobility i.e., 
DBpedia data objects can reside in multiple data spaces without an DNS 
heuristics re. delivering on Linked Data goals re. data access across Linked 
Data Spaces, that's also decoupled from DNS.

Here is a sequence that showcases my point:

1. 
http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http%3A%2F%2Fdbpedia.org%2F.well-known%2Fhost-meta&useragentheader=&acceptheader=
 -- discover templates offered by the dbpedia.org data space

2. 
http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http%3A%2F%2Flod.openlinksw.com%2F.well-known%2Fhost-meta&useragentheader=&acceptheader=
 -- ditto for LOD cloud cache data space

3. http://dbpedia.org/describe/?uri=http://dbpedia.org/resource/Paris -- 
description of 'Paris' from DBpedia

4. http://lod.openlinksw.com/describe/?uri=http://dbpedia.org/resource/Paris -- 
ditto from LOD cloud cache.


There are two functions (not one) in play when dealing with Hyperdata Links or 
Hyperlinks used as mechanism for Whole Data Representation re. AWWW based 
Linked Data meme. If there are two functions in play, and one is already widely 
used by the target users ( i.e., the Address function - URL), why do we 
instinctively speak about the de-reference (indirection) based Name function 
irrespective of target audience?

If what I am pushing against is so wrong, then tell me why this old time tested 
pattern is so difficult to comprehend by broader audiences, once AWWW based 
Linked Data fronts the narrative?

Here is my final example, this time using Facebook Data Object URL: 
http://graph.facebook.com/kidehen .

Note I get the following data from the Facebook URL:

{
   "id": "605980750",
   "name": "Kingsley Uyi Idehen",
   "first_name": "Kingsley",
   "middle_name": "Uyi",
   "last_name": "Idehen",
   "link": "https://www.facebook.com/kidehen";,
   "username": "kidehen",
   "gender": "male",
   "locale": "en_US"
}


Note that "id" is literal and so is its value. This chunk of data (resource) 
doesn't have a Name it just has an access Address. Of course, this isn't the case within 
Facebook's internal setup etc..

If I tweak the "id" value and replace it with a Link, I've added some Web scale 
introspection that also delivers a Name to this erstwhile anonymous data object.

{
   "id": http://www.facebook.com/kidehen#this,
   "name": "Kingsley Uyi Idehen",
   "first_name": "Kingsley",
   "middle_name": "Uyi",
   "last_name": "Idehen",
   "link": "https://www.facebook.com/kidehen";,
   "username": "kidehen",
   "gender": "male",
   "locale": "en_US"
}

Repeat what I've stated above for each Attribute=Value pair and you soon have a 
Linked Data graph i.e. data representation using Links delivered via a JSON 
based Graph.

Finished product (courtesy of our Linked Data middleware aka Sponger):

1. http://linkeddata.uriburner.com/about/html/http/graph.facebook.com/kidehen 
-- HTML based Data Container (Data Source) Description Doc

2. 
http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/http/graph.facebook.com/kidehen
  -- HTML based Profile Doc

3. 
http://linkeddata.uriburner.com/about/id/entity/http/graph.facebook.com/kidehen 
-- Proxy Linked Data URI for Profile Page Subject (me in this case)

4. 
http://linkeddata.uriburner.com/describe/?uri=http://linkeddata.uriburner.com/about/id/entity/http/graph.facebook.com/kidehen
 - alternative Entity Description page using pattern from example above .

Conclusion: it isn't about synthetic vs natural identifiers. It's about accentuating the 
mechanics of Linked Data with clarity so that we can introduce this concept to a broader 
community of people, including those that have long mastered the fundamental mechanics of 
"data access by reference" in other realms of applied computer science.

We shouldn't attempt to carve out an Island from the continent sized continuum 
of computer science.

Loose use of URI as Linked Data narrative driver hasn't cut it to date, and 
that won't change anytime soon re., moving forward :-)


Kingsley

Best wishes

Martin

PS: I assume you are not proposing to use cryptic class names in Java 
programming, hexadecimal parameter names in REST interfaces, numeric e-mail 
addresses, hash values as twitter user identifiers, skype account identifiers, 
and replacing top-level domains by a two-digit hex number.

Outlook into the simply wonderful future that you are proposing:

- My e-mail address, as suggested by the SW community: AE0FD5678F@AAEE101F.0F

- HTML, revisited by the Semantic Web Community
<F7E3>
<03ED>Page</03ED>
<0709>Paragraph</0709>
<03EE>
        <77FF>
                <87ED>blabla<87ED>
                <87ED>blabla<87ED>
                <87ED>blabla<87ED>
        </77FF>
</03EE>
</F7E3>

- Python, revisited by the Semantic Web Community
class EE77E303(BB1233.AAB012):
        """Returns an RDF/XML dump of the ontology classes"""
        def 0936123(self, A34D=True):
                """ Dump page handler - returns one single RDF/XML representation of the 
ontology """
                if 'Accept' in self.B345.B7934:
                        ABCD = self.B345.B7934['Accept']
                else:
                        ABCD = ""
                self.B345.B7934['Content-Type'] = "application/rdf+xml"
                self.B345.B7934['Access-Control-Allow-Origin'] = "*" # CORS
                if A34D:                

- Python, revisited by the Semantic Web Community in collaboration with the 
Pedantic Web Movement (enforcing the consistent implementation of a bad idea ;-)

001A EE77E303(BB1233.AAB012):
        """Returns an RDF/XML dump of the ontology classes"""
        B123 0936123(0000, A34D=FFFF):
                """ Dump page handler - returns one single RDF/XML representation of the 
ontology """
                AA 'Accept' in 0000.B345.B7934:
                        ABCD = 0000.B345.B7934['Accept']
                BB:
                        ABCD = ""
                0000.B345.B7934['Content-Type'] = "application/rdf+xml"
                0000.B345.B7934['Access-Control-Allow-Origin'] = "*" # CORS
                AA A34D:                


PPS: I wrote some kilobytes of Z80 programs in pure machine code without having 
an assembler at hand back in the 1980s. I know what I am talking about ;-)

On May 18, 2011, at 10:14 PM, Michael F Uschold wrote:

see below.,

On Wed, May 18, 2011 at 12:55 PM, glenn mcdonald<gl...@furia.com>   wrote:
I agree wholeheartedly that URIs should be pure identifiers, with no embedded 
semantics or assumptions of readability. And I agree with Kingsley that there's 
an elephant in the room. I might even agree with Kingsley about what the 
elephant is.

But to say it from my point of view: machines need to think in ids, people need to think 
in names. The RDF/SPARQL "stack", such as it is, has not internalized the 
implications of this duality, and thus isn't really prepared to support both audiences 
properly.

Very well put, Glenn!

Almost all the canonical examples of RDF and SPARQL avoid this issue by using 
toy use-cases with semi-human-readable URIs, and/or with literals where there 
ought to be nodes. If you try to do a non-trivial dataset the right way, you'll 
immediately find that writing the RDF or the SPARQL by hand is basically 
intractable. If you try to produce an human-intelligible user-interface to such 
data, you'll find yourself clinging to rdfs:label for dear life, and then 
falling, falling, falling...

In fact, there's almost nothing more telling than the fact that rdfs:label is 
rdfS! This is in some ways the most fundamental aspect of human/computer 
data-interaction, and RDF itself has essentially nothing to say about it.



--
Michael Uschold, PhD
    Senior Ontology Consultant, Semantic Arts
    LinkedIn: http://tr.im/limfu
    Skype, Twitter: UscholdM


--

Regards,

Kingsley Idehen 
President&   CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen



--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Best Practice for Renaming OWL Vocabulary Elements

Reply via email to