Richard Cyganiak wrote: > Marvin, Kingsley, > > On 20 Aug 2008, at 00:16, Georgi Kobilarov wrote: >> yes, it's a bug in our dataset. > > Actually, no. It's a bug in Virtuoso's SPARQL+XML result format > serializer. > > Ampersands are allowed in URIs, so the Yago URIs are perfectly fine > according to all the specs. (We *might* still want to %-encode the > ampersand in those URIs, but just for consistency with our other URIs, > not because the specs require it. That's a separate question.) > > The problem is: When a "&" character is included in content inside an > XML file, it has to be written as "&". Virtuoso does not do this, > hence the breakage. > > (This is a silly bug. The need to encode reserved characters (& and ") > is just about the first thing a developer learns about XML. I hope > OpenLink fixes this soon. Kingsley?) > > Richard > > > >> In particular in the Yago dataset, which >> has been contributed externally and wasn't created with the DBpedia >> framework (but hey, we've got many similar bugs in datasets created by >> our framework ;)) >> >> Yago URIs have not been url-encoded. So as a workaround, you can >> url_encode all URIs starting with http://dbpedia.org/class/yago/ in the >> yago_en.nt file before loading it into your Jena model. That should do >> it. >> >> And we'll fix that bug for the future. >> >> Best, >> Georgi >> >> -- >> Georgi Kobilarov >> Freie Universität Berlin >> www.georgikobilarov.com >> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] >> [mailto:dbpedia- >>> [EMAIL PROTECTED] On Behalf Of Marvin Lugair >>> Sent: Wednesday, August 20, 2008 12:57 AM >>> To: [email protected] >>> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI >>> breakingJena code >>> >>> >>> Hi, >>> >>> The following sparql query: >>> select distinct ?Concept where {[] a ?Concept >>> >>> Is the default query at the dbpedia endpoint http://dbpedia.org/sparql >>> It returns several URI's including the following one (notice the and >>> sign): >>> >>> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople >>> >>> So DBPedia is returning URI's containing an ampersand. This is causing >>> an exception in the Jena parser. >>> >>> How do I fix this? None of Jenas methods will work, I cant transofrm >>> the resultset into a model or even print is with the resultformatter. >>> If i iterate over it, I can print the results one by one till I get to >>> the malformed URI. How do I check in my code for malformed URI's? >>> >>> >>> Any ideas? >>> Thanks! >>> Marv >>> ------------- >>> >>> The code below works till i get a URI with an ampersand. >>> The exception is coming from results.nextSolution(). Other Jena >>> methods to convert the retrieved resultset to a model directly or >>> format it produce the same exception (I assume they have a similar >>> iterator inside) >>> >>> >>> QueryExecution qexec = >>> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", >>> "select distinct ?Concept where {[] a ?Concept}"); >>> >>> try { >>> ResultSet results = qexec.execSelect(); >>> for ( ; results.hasNext() ; ) >>> { >>> QuerySolution soln = results.nextSolution() ; >>> String x = soln.get("Concept").toString(); >>> System.out.print(x +"\n"); >>> } >>> } >>> >>> finally { >>> System.out.println("closing!"); >>> qexec.close() ; >>> } >>> >>> >>> This will result in the following error: >>> >>> >>> [com.ctc.wstx.exc.WstxLazyException] >>> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '<' >>> (code 60); expected a semi-colon after the reference for entity >>> 'MelindaGatesFoundationPeople' >>> at [row,col {unknown-source}]: [2609,96] >>> at >>> >> com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4 >>> 5) >>> at >>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671) >>> at >>> >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav >>> a:3505) >>> at >>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804) >>> at >>> >> com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java >>> :674) >>> at >>> >> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolut >>> ion(XMLIn\ >>> putStAX.java:472) >>> at >>> >> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XML >>> InputStAX\ >>> .java:213) >>> >>> >>> >>> I also posted this on the Jena group but some seem to suggest it is a >>> dbpedia issue: http://tech.groups.yahoo.com/group/jena- >>> dev/message/36210 >>> >>> >>> >>> >>> >>> >> ----------------------------------------------------------------------- >>> -- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the >>> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Dbpedia-discussion mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> ------------------------------------------------------------------------- >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > All,
Fixed. Please verify. -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
