Hi Yves,

http://dbpedia.org/URIencoding is outdated. For DBpedia 3.8, we moved
a bit closer to Wikipedia encoding and don't escape brackets anymore.

It's all a bit confusing right now because
http://dbpedia.org/resource/... already displays the new DBpedia 3.8
data, which is not yet available for download, but will be very soon.

The main datasets don't escape brackets anymore, some link datasets
still do. We should probably fix the link datasets.

I'll update http://dbpedia.org/URIencoding as soon as I can.
Meanwhile, the escaping rules can be found in this Scala code:

http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/7897a0cb8ffb/core/src/main/scala/org/dbpedia/extraction/util/WikiUtil.scala#l32

private val iriReplacements = StringUtils.replacements('%', "\"#%<>?[\\]^`{|}")

This means that only the following characters are URI-encoded
("percent-encoded"). The first character is a double quote.

"#%<>?[\]^`{|}

As usual, space is replaced by underscore. The
http://dbpedia.org/resource/... URIs are URIs, not IRIs, so
additionally we URI-encode all non-ASCII characters.

So much for now,
Christopher

On Tue, Jul 24, 2012 at 10:50 AM, Yves Raimond <yves.raim...@gmail.com> wrote:
>>> Sorry to be a bit pushy, but that dump actually has URIs that are
>>> formatted rightly according the URI encoding guidelines, with brackets
>>> escaped, which is my main point.
>>>
>>> Just to sum up:
>>>   * URI encoding guidelines say brackets should be %-escaped
>>>   * Yago dump has them %-escaped
>>>   * DBpedia dump doesn't have them %-escaped
>>>
>>> Which I hope explains why I find all that very confusing!
>>
>>
>> Conclusion, the dump at:
>> http://downloads.dbpedia.org/3.7/links/yago_links.nt.bz2
>> which is based on Yago contains incorrect mappings, right?
>
> Well, my problem is more that the current DBpedia dump doesn't seem to
> apply the URI encoding rules at http://dbpedia.org/URIencoding, which
> makes it basically impossible to predict what Wikipedia URI
> corresponds to what DBpedia URI. This Yago dump escapes brackets,
> exactly as described in those encoding rules.
>
> Best,
> y
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to