Dear all,

I just checked a few specs to figure out what would be the best policy
for DBpedia regarding URI encoding.

In summary, I think DBpedia should encode as few characters as
possible, e.g. use '&', not '%26'.

The URI spec [1] has a lot of special cases, but in the end it's quite
clear that in our case we do not HAVE to encode most special
characters like '&'. See 3.3 Path Component.

More importantly, the RDF spec includes the following note [2]:

Because of the risk of confusion between RDF URI references that would
be equivalent if derefenced, the use of %-escaped characters in RDF
URI references is strongly discouraged.

Could hardly be clearer...


A related, but different issue is how Wikipedia and Virtuoso dereference URIs.

Wikipedia is very lenient: "&_(EP)" [3] is equivalent to
"%26_%28EP%29" [4]. Even "OS%2F2" [5] is treated as equivalent to
"OS/2" [6]. (Not sure which of these bahaviors is or isn't violating
the URI spec).

Virtuoso on dbpedia.org is very strict: it only returns data for
"OS/2" [7] and "&_%28EP%29" [8], but empty pages for all other
encoding variants.


Christopher

[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference
[3] http://en.wikipedia.org/wiki/&_(EP)
[4] http://en.wikipedia.org/wiki/%26_%28EP%29
[5] http://en.wikipedia.org/wiki/OS%2F2
[6] http://en.wikipedia.org/wiki/OS/2
[7] http://dbpedia.org/resource/OS/2
[8] http://dbpedia.org/resource/&_%28EP%29


On Tue, Feb 21, 2012 at 15:04, Jimmy O'Regan <jore...@gmail.com> wrote:
> On 21 February 2012 13:47, Richard Light <rich...@light.demon.co.uk> wrote:
>> Jimmy,
>>
>> Not, I'm not confused.  :-)
>>
>
> Fair enough.
>
>> I just thought that if the "&" were URLencoded it wouldn't need to be XML
>> escaped, because as you say it would then read "%26", and so wouldn't cause
>> problems to the XML parser.  And I thought URLencoding should happen here.
>> To quote a random Web source [1]:
>
> That the URL isn't XML escaped in RDF/XML is clearly and unambiguously
> a bug; that it isn't URL escaped is more a matter for discussion, but
> the general consensus will probably be 'do what Wikipedia do', which
> is to not escape ampersands.
>
> --
> <Sefam> Are any of the mentors around?
> <jimregan> yes, they're the ones trolling you
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to