I notice lines in the dbpedia dumps that look like

<http://dbpedia.org/resource/Boston%2C_MA> <http://dbpedia.org/property/redirect> <http://dbpedia.org/resource/Boston> .

    Note the URL encoded %2C=",".

    Anyhow,  if I go to

http://dbpedia.org/page/Boston%2C_MA

I see two redirects [one of which unescapes the comma] and ultimately end up at

http://dbpedia.org/page/Boston

    If I go to Wikipedia

http://wikipedia.org/page/Boston%2C_MA

    I get redirected to

http://wikipedia.org/page/Boston,_MA

which, oddly, displays the same content as "Boston" [rather than 301 redirecting...]

    When I do

 curl -H "Accept: application/rdf+xml" http://dbpedia.org/data/Boston.xml

     I see stuff like

<rdf:Description rdf:about="http://dbpedia.org/resource/Harvey_Mason%2C_Jr.";><dbpedia-owl:birthPlace xmlns:dbpedia-owl="http://dbpedia.org/ontology/"; rdf:resource="http://dbpedia.org/resource/Boston"/></rdf:Description>

    Now If I run the SPARQL query

select ?Predicate where {<http://dbpedia.org/resource/Harvey_Mason,_Jr.> ?Predicate <http://dbpedia.org/resource/Boston> }

    I get nothing,  but if I run

select ?Predicate where {<http://dbpedia.org/resource/Harvey_Mason%2C_Jr.> ?Predicate <http://dbpedia.org/resource/Boston> }

    I get

http://dbpedia.org/ontology/birthPlace

So it looks like the %-encoded URI is the "real URI" in dbpedia. Obviously I ought to keep it around in case I want to run a SPARQL query now and then. Also, dbpedia encodes wikipedia this way as well,

<http://en.wikipedia.org/wiki/Harvey_Mason%2C_Jr.> <http://xmlns.com/foaf/0.1/primaryTopic> <http://dbpedia.org/resource/Harvey_Mason%2C_Jr.> .

------

I took a look at some standards docs and found:

http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference

I see that we encode UTF-8 text as octets, and if the octets aren't US-ASCII characters, I wed %-encode them. However, the spec also says that

*"Note:* Because of the risk of confusion between RDF URI references that would be equivalent if derefenced, the use of %-escaped characters in RDF URI references is strongly discouraged. "

------

Now the problem I've got with the Ookaboo API is that I know people are going to punch in

http://wikipedia.org/page/Boston,_MA

and I need to turn this into the right dbpedia URL. My plan for dealing with this is to

(i) store the exact URI I get out of dbpedia,
(ii) always give people the exact URI out of dbpedia (if I publish RDFa or JSON data), (iii) give the same URI for wikipedia that dbpedia gives (in HTML, RDFa, etc.) (iv) if I get a query, apply the same canonicalization rules that dbpedia uses...

Which begs the question of what exactly those rules are.  What are they?


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to