I notice lines in the dbpedia dumps that look like
<http://dbpedia.org/resource/Boston%2C_MA>
<http://dbpedia.org/property/redirect>
<http://dbpedia.org/resource/Boston> .
Note the URL encoded %2C=",".
Anyhow, if I go to
http://dbpedia.org/page/Boston%2C_MA
I see two redirects [one of which unescapes the comma] and
ultimately end up at
http://dbpedia.org/page/Boston
If I go to Wikipedia
http://wikipedia.org/page/Boston%2C_MA
I get redirected to
http://wikipedia.org/page/Boston,_MA
which, oddly, displays the same content as "Boston" [rather than
301 redirecting...]
When I do
curl -H "Accept: application/rdf+xml" http://dbpedia.org/data/Boston.xml
I see stuff like
<rdf:Description
rdf:about="http://dbpedia.org/resource/Harvey_Mason%2C_Jr."><dbpedia-owl:birthPlace
xmlns:dbpedia-owl="http://dbpedia.org/ontology/"
rdf:resource="http://dbpedia.org/resource/Boston"/></rdf:Description>
Now If I run the SPARQL query
select ?Predicate where {<http://dbpedia.org/resource/Harvey_Mason,_Jr.>
?Predicate <http://dbpedia.org/resource/Boston> }
I get nothing, but if I run
select ?Predicate where
{<http://dbpedia.org/resource/Harvey_Mason%2C_Jr.> ?Predicate
<http://dbpedia.org/resource/Boston> }
I get
http://dbpedia.org/ontology/birthPlace
So it looks like the %-encoded URI is the "real URI" in dbpedia.
Obviously I ought to keep it around in case I want to run a SPARQL query
now and then. Also, dbpedia encodes wikipedia this way as well,
<http://en.wikipedia.org/wiki/Harvey_Mason%2C_Jr.>
<http://xmlns.com/foaf/0.1/primaryTopic>
<http://dbpedia.org/resource/Harvey_Mason%2C_Jr.> .
------
I took a look at some standards docs and found:
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference
I see that we encode UTF-8 text as octets, and if the octets aren't
US-ASCII characters, I wed %-encode them. However, the spec also says
that
*"Note:* Because of the risk of confusion between RDF URI references
that would be equivalent if derefenced, the use of %-escaped characters
in RDF URI references is strongly discouraged. "
------
Now the problem I've got with the Ookaboo API is that I know people are
going to punch in
http://wikipedia.org/page/Boston,_MA
and I need to turn this into the right dbpedia URL. My plan for dealing
with this is to
(i) store the exact URI I get out of dbpedia,
(ii) always give people the exact URI out of dbpedia (if I publish RDFa
or JSON data),
(iii) give the same URI for wikipedia that dbpedia gives (in HTML,
RDFa, etc.)
(iv) if I get a query, apply the same canonicalization rules that
dbpedia uses...
Which begs the question of what exactly those rules are. What are they?
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion