Hi Dimitris,

They haven't fixed it yet since I'm using the latest Open Source version 
compiled from source. It is a simple encoding issue that can be worked 
around in java with a simple hack:  output = new 
String(input.getBytes("ISO-8859-1"), "UTF8"); . I don't have much 
knowlege about the different character sets, but I think they are 
encoding the  UTF8 URIs in ISO-8859-1 instead of UTF8. This doesn't 
happen with the literal values since the UTF8 characters are escaped as 
ASCII.

About the validity, the problem I've noticed is with the Property names, 
certain special characters such as brackets have to be filtered in order 
for it to be valid XML. I've had this issue with the German DBpedia, and 
as you can see right now all resourced produce valid XML and can be 
queries trough Sparql clients. If however there is a deeper problem with 
IRIs in RDF/XML that I'm unaware of, we should discuss it and push for 
another default serialization format for SPARQL.


Regards,
Alexandru

On 10/18/2011 09:29 AM, Dimitris Kontokostas wrote:
> Hi Alexandru,
>
> This is a known issue and we reported it to virtuoso ~9 months ago.
> Unfortunatelly we use debian packages for our installation which
> usually are a little behind from the latest releases, so we can't say
> if it is fixed
>
> But, IRIs cannot be 100% serialized in RDF/XML.
> So even if Virtuoso fixes the encoding, the rdf might still be invalid
>
> Regards,
> Dimitris
>
> On Mon, Oct 17, 2011 at 6:42 PM, Alexandru Todor<[email protected]>  
> wrote:
>> Hi,
>>
>> I've recieved a mail a couple of weeks ago from some users of the German
>> DBpedia a few weeks ago who where reporting that they weren't getting
>> any results when querying the endpoint for URIs that contained German
>> umlauts(or any other utf8 characters). I reported the issue to the Jena
>> mailing list and they fixed it, but in the process we also discovered a
>> bug with Virtuoso.
>>
>> There is a problem with the IRI encoding in the DBpedia
>> Internationalization VAD. Namely when querying the SPARQL endpoint the
>> encoding of the IRIs in RDF/XML is garbled. The issue can be found in
>> both Greek and German endpoints.
>>
>> For example: http://de.dbpedia.org/data/Berlin-Dahlem.rdf , in the first
>> XML lines yo you will notice things linke
>> http://de.dbpedia.org/resource/Königin-Luise-Stiftung instead of
>> http://de.dbpedia.org/resource/Königin-Luise-Stiftung or
>> http://de.dbpedia.org/resource/Gernot_Michael_Müller instead of
>> http://de.dbpedia.org/resource/Gernot_Michael_Müller. You will notice
>> simmilar issues if you look at this resource from the Greek DBpedia:
>> http://el.dbpedia.org/data/Αλέξανδρος_ο_Μέγας.rdf .
>>
>> This problems is that when querying the Internationalization Endpoints
>> not only with Jena but with any other SPARQL client, the user is going
>> to getting garbled IRIs if they contain UTF8 characters.
>>
>>
>> Kind Regards,
>> Alexandru Todor
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to