Hi Magnus,

I think Dimitris already answered your main questions. Just wanted to
add a few details with my language-lawyer hat on.

On 30 July 2013 13:43, Magnus Knuth <magnus.kn...@hpi.uni-potsdam.de> wrote:
>
> Am 30.07.2013 um 01:36 schrieb Jona Christopher Sahnwaldt:
>
>> On 29 July 2013 07:56, Magnus Knuth <magnus.kn...@hpi.uni-potsdam.de> wrote:
>>> Hi Jona and all,
>>>
>>> Am 26.07.2013 um 23:29 schrieb Jona Christopher Sahnwaldt:
>>>
>>>> Hi Magnus & all,
>>>>
>>>> first a minor correction -
>>>> http://dbpedia.org/resource/Don_Payne_%28writer%29 and
>>>> http://dbpedia.org/resource/Don_Payne_(writer) both are valid URIs and
>>>> IRIs. URIs don't allow non-ASCII characters, IRIs do, that's the only
>>>> difference. [1][2]
>>>
>>> Sure, just wanted to name the different URI representations somehow.
>>> However, both named URIs are different but refer to the very same entity.
>>>
>>>> That being said, I'm not very familiar with DBpedia live and can only
>>>> speculate about this behavior. Maybe it has to do with the changes in
>>>> the URI encoding we implemented for the DBpedia 3.8 release. See
>>>> http://wiki.dbpedia.org/URIencoding
>>>>
>>>> Maybe the URIs using percent-encoded parentheses were produced before
>>>> the code changes for 3.8? I don't know.
>>>
>>> That might be a reason, but in the contemporary dumps, the old 
>>> percentage-encoded non-ASCII symbols are still contained. Apparently, at 
>>> least all subject-URIs are percentage-encoded, while at least some objects 
>>> are referred by IRIs without percentage-encoded symbols = inconsistent.
>>
>> What do you mean by "contemporary dumps"? Can you post a link to an
>> example file that contains inconsistent IRIs/URIs?
>>
>
> E.g. http://live.dbpedia.org/liveupdates/2013/07/30/10/000446.added.nt.gz 
> contains:
>
> <http://dbpedia.org/resource/All_Bout_U_%282Pac_Song%29> 
> <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> 
> <http://en.wikipedia.org/wiki/All_Bout_U_(2Pac_Song)> .
> <http://en.wikipedia.org/wiki/All_Bout_U_%282Pac_Song%29> 
> <http://xmlns.com/foaf/0.1/primaryTopic> 
> <http://dbpedia.org/resource/All_Bout_U_(2Pac_Song)> .
>
> moreover some URIs contain Unicode-Encoding, which resembles yet another 
> version of the URI, same folder 000238.added.nt.gz contains:
>
> <http://en.wikipedia.org/wiki/Andr%C3%A9s_Romero_%28Chilean_footballer%29> 
> <http://xmlns.com/foaf/0.1/primaryTopic> 
> <http://dbpedia.org/resource/Andr\u00E9s_Romero_(Chilean_footballer)> .
> <http://dbpedia.org/resource/Andr%C3%A9s_Romero_%28Chilean_footballer%29> 
> <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> 
> <http://en.wikipedia.org/wiki/Andr\u00E9s_Romero_(Chilean_footballer)> .
>

I see. I thought you meant there were such errors in the dumps at
http://wiki.dbpedia.org/Downloads38 .

> According to http://wiki.dbpedia.org/URIencoding the URI should be:
>
> http://dbpedia.org/resource/Andr%C3%A9s_Romero_(Chilean_footballer)

That's right. By the way, I'd love to move to IRIs for DBpedia English
as well. Others think we should stick with URIs for backwards
compatibility.

>
> while for Wikipedia any combination works:
>
> http://en.wikipedia.org/wiki/Andr[ é | %C3%A9 | %E9 ]s_Romero_[ ( | %28 
> ]Chilean_footballer[ ) | %29 ]

That's because Wikipedia URIs are used as URLs - they are used in HTTP
requests, and thus (roughly speaking) most percent-encodings don't
matter. DBpedia URIs on the other hand are RDF "URI references" and
thus define a resource based on string equality - every little
difference matters. That's one of the problems of using URIs as
identifiers.

JC

>
> Anyway, since we are using IRIs, i.e. http://wiki.dbpedia.org/Downloads38 
> .ttl Files, and would like to have a compatible DBpediaLive, we need to tweak 
> these Dumps and the Integrator somehow.
>
>>>
>>> Still: Can someone give a hint how to easily transform the N3-dumps to 
>>> IRIs, preferably by command line instruction?
>>
>> This may help:
>>
>> https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/RecodeUris.scala
>
> Thanks, I will check that.
>
>>>
>>> Regards
>>> Magnus
>>>
>>>> Regards,
>>>> JC
>>>>
>>>> [1] https://tools.ietf.org/html/rfc3986#section-2.2
>>>> [2] https://tools.ietf.org/html/rfc3987#section-2.1
>>>>
>>>> On 26 July 2013 18:33, Magnus Knuth <magnus.kn...@hpi.uni-potsdam.de> 
>>>> wrote:
>>>>> Hello Mohammed and DBpedia list,
>>>>>
>>>>> I try to use the DBpedia Live integration, but there seem to be some 
>>>>> issues with the dumps. Some URLs are URL-encoded, though some are IRIs. 
>>>>> That makes DBpediaLive hard to use.
>>>>>
>>>>> Same holds for the DBpediaLive Sparql endpoint 
>>>>> [http://live.dbpedia.org/sparql]
>>>>>
>>>>> SELECT * WHERE { ?s ?p 
>>>>> <http://dbpedia.org/resource/Don_Payne_%28writer%29>}
>>>>>
>>>>> and
>>>>>
>>>>> SELECT * WHERE { ?s ?p <http://dbpedia.org/resource/Don_Payne_(writer)>}
>>>>>
>>>>> deliver different results.
>>>>>
>>>>> Is that issue known, relevant, and are there any efforts to move to IRIs 
>>>>> someday?
>>>>> Someone can give a hint how to easily transform the dumps to IRIs for the 
>>>>> between time?
>>>>>
>>>>> Best regards and a sunny weekend
>>>>> Magnus
>>>>>
>

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to