On 30/09/11 17:08, Alexandru Todor wrote:
Hi,

Seems to be a Virtuoso issue with the RDF/XML serializer. Both the Greek
and German endpoints seem to have the garbled data in the XML files and
this issue only arises when choosing RDF/XML as output. Thanks for the
tip, I'll report the issue to the Virtuoso devs.

Could you also report that

1/ asking for N-triples does not return N-triples. It returns something Turtle-ish.

2/ The SPARQL XML results has the same encoding problem as RDF/XML.

These have somewhat slowed down the bug hunting.

There's still the problem with QueryExecutionFactory.sparqlService
returning no results.

Yes - I found it (in turning queries into strings). I need to do some careful testing to make sure the fix does not break something elsewhere that incorrectly depends on the effect.

        Andy




Kind Regards,
Alexandru

On 09/30/2011 05:33 PM, Andy Seaborne wrote:
On 30/09/11 16:17, Alexandru Todor wrote:
Hi,

I maintain the German language DBpedia endpoint, and have gotten some
mails from users complaining that they don't get any results from the
endpoint when they query for resources like:

http://de.dbpedia.org/resource/München

This message and your message are ISO-8859-1

ü = 0xFC in ISO-8859-1 which is the same as a Unicode codepoint and
0xC3 0xBC in UTF-8.

I tried http://de.dbpedia.org/resource/München in my browser and got:

to http://de.dbpedia.org/data/M%C3%BCnchen.xml

which returns:

RDF/XML in UTF-8 but it contains e.g. line 3:

rdf:resource="http://de.dbpedia.org/resource/München";

in Firefox. That looks corrupt to me.

This is the code they sent me:

String queryString= "SELECT ?o WHERE
{<http://de.dbpedia.org/resource/München>
<http://purl.org/dc/terms/subject> ?o }";
Query query = QueryFactory.create(queryString);
QueryExecution qexec =
QueryExecutionFactory.sparqlService("http://de.dbpedia.org/sparql";,
query);
try {
ResultSet results = qexec.execSelect();
for (; results.hasNext();) {
QuerySolution s = results.nextSolution();
System.out.println(s.toString());
}
}
finally {
qexec.close();
}

I tried the code and it works for any IRI that contains no UTF8 chars
(so only for URIs), but when you have UTF8 chars it returns no result.
I've tried a couple of variations and it returns no result but also
doesn't throw any kind of exception, it's just as if the data wasn't
there.

Then I proceeded to try an alternative method and used QueryEngineHTTP
to execute the query and it worked. However, QueryEngineHTTP messes up
the UTF8 encoding, so for example in the returned results you get
München instead of München . My guess is that QueryEngineHTTP encodes
the SPARQL results in ISO-8859-1 instead of UTF8, so decoding the
strings as ISO-8859-1 and re-encoding it as UTF8 fixed this.

the code seems to do:

URLEncoder.encode(s, "UTF-8")

but it's still working in strings. Something lower level (Sun
networking) does the string to bytes.

Andy


Kind Regards,
Alexandru Todor

Research Associate
AG Corporate Semantic Web
Freie Universität Berlin








Reply via email to