Christopher,

I don't have strong views about the details of URI encoding. (In my view, both Wikipedia and dbpedia should use simple numeric identifiers for each concept, rather than these stupid and mutable made-up-from-the-page-title ones. But that's maybe a separate thread.)

However, I think I need to point out that the reason I started this thread is that URLs containing '&' lead to broken (non-well-formed) RDF/XML. So I think that '%26' is mandatory, whatever happens to other characters.

Richard

On 06/03/2012 00:14, Jona Christopher Sahnwaldt wrote:
Dear all,

I just checked a few specs to figure out what would be the best policy
for DBpedia regarding URI encoding.

In summary, I think DBpedia should encode as few characters as
possible, e.g. use '&', not '%26'.

The URI spec [1] has a lot of special cases, but in the end it's quite
clear that in our case we do not HAVE to encode most special
characters like '&'. See 3.3 Path Component.

More importantly, the RDF spec includes the following note [2]:

Because of the risk of confusion between RDF URI references that would
be equivalent if derefenced, the use of %-escaped characters in RDF
URI references is strongly discouraged.

Could hardly be clearer...


A related, but different issue is how Wikipedia and Virtuoso dereference URIs.

Wikipedia is very lenient: "&_(EP)" [3] is equivalent to
"%26_%28EP%29" [4]. Even "OS%2F2" [5] is treated as equivalent to
"OS/2" [6]. (Not sure which of these bahaviors is or isn't violating
the URI spec).

Virtuoso on dbpedia.org is very strict: it only returns data for
"OS/2" [7] and"&_%28EP%29" [8], but empty pages for all other
encoding variants.


Christopher

[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference
[3] http://en.wikipedia.org/wiki/&_(EP)
[4] http://en.wikipedia.org/wiki/%26_%28EP%29
[5] http://en.wikipedia.org/wiki/OS%2F2
[6] http://en.wikipedia.org/wiki/OS/2
[7] http://dbpedia.org/resource/OS/2
[8] http://dbpedia.org/resource/&_%28EP%29


On Tue, Feb 21, 2012 at 15:04, Jimmy O'Regan<jore...@gmail.com>  wrote:
On 21 February 2012 13:47, Richard Light<rich...@light.demon.co.uk>  wrote:
Jimmy,

Not, I'm not confused.  :-)

Fair enough.

I just thought that if the "&" were URLencoded it wouldn't need to be XML
escaped, because as you say it would then read "%26", and so wouldn't cause
problems to the XML parser.  And I thought URLencoding should happen here.
To quote a random Web source [1]:
That the URL isn't XML escaped in RDF/XML is clearly and unambiguously
a bug; that it isn't URL escaped is more a matter for discussion, but
the general consensus will probably be 'do what Wikipedia do', which
is to not escape ampersands.

--
<Sefam>  Are any of the mentors around?
<jimregan>  yes, they're the ones trolling you

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--
*Richard Light*
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to