Kingsley, from looking at the query result it seems like the issue is fixed. Thanks!
Confirmation from someone who uses Jena to access the SPARQL endpoint would be nice. Richard On 20 Aug 2008, at 19:30, Kingsley Idehen wrote: > Richard Cyganiak wrote: >> Marvin, Kingsley, >> >> On 20 Aug 2008, at 00:16, Georgi Kobilarov wrote: >>> yes, it's a bug in our dataset. >> >> Actually, no. It's a bug in Virtuoso's SPARQL+XML result format >> serializer. >> >> Ampersands are allowed in URIs, so the Yago URIs are perfectly fine >> according to all the specs. (We *might* still want to %-encode the >> ampersand in those URIs, but just for consistency with our other >> URIs, not because the specs require it. That's a separate question.) >> >> The problem is: When a "&" character is included in content inside >> an XML file, it has to be written as "&". Virtuoso does not do >> this, hence the breakage. >> >> (This is a silly bug. The need to encode reserved characters (& and >> ") is just about the first thing a developer learns about XML. I >> hope OpenLink fixes this soon. Kingsley?) >> >> Richard >> >> >> >>> In particular in the Yago dataset, which >>> has been contributed externally and wasn't created with the DBpedia >>> framework (but hey, we've got many similar bugs in datasets >>> created by >>> our framework ;)) >>> >>> Yago URIs have not been url-encoded. So as a workaround, you can >>> url_encode all URIs starting with http://dbpedia.org/class/yago/ >>> in the >>> yago_en.nt file before loading it into your Jena model. That >>> should do >>> it. >>> >>> And we'll fix that bug for the future. >>> >>> Best, >>> Georgi >>> >>> -- >>> Georgi Kobilarov >>> Freie Universität Berlin >>> www.georgikobilarov.com >>> >>>> -----Original Message----- >>>> From: [EMAIL PROTECTED] >>> [mailto:dbpedia- >>>> [EMAIL PROTECTED] On Behalf Of Marvin >>>> Lugair >>>> Sent: Wednesday, August 20, 2008 12:57 AM >>>> To: [email protected] >>>> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI >>>> breakingJena code >>>> >>>> >>>> Hi, >>>> >>>> The following sparql query: >>>> select distinct ?Concept where {[] a ?Concept >>>> >>>> Is the default query at the dbpedia endpoint http://dbpedia.org/sparql >>>> It returns several URI's including the following one (notice the >>>> and >>>> sign): >>>> >>>> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople >>>> >>>> So DBPedia is returning URI's containing an ampersand. This is >>>> causing >>>> an exception in the Jena parser. >>>> >>>> How do I fix this? None of Jenas methods will work, I cant >>>> transofrm >>>> the resultset into a model or even print is with the >>>> resultformatter. >>>> If i iterate over it, I can print the results one by one till I >>>> get to >>>> the malformed URI. How do I check in my code for malformed URI's? >>>> >>>> >>>> Any ideas? >>>> Thanks! >>>> Marv >>>> ------------- >>>> >>>> The code below works till i get a URI with an ampersand. >>>> The exception is coming from results.nextSolution(). Other Jena >>>> methods to convert the retrieved resultset to a model directly or >>>> format it produce the same exception (I assume they have a similar >>>> iterator inside) >>>> >>>> >>>> QueryExecution qexec = >>>> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", >>>> "select distinct ?Concept where {[] a ?Concept}"); >>>> >>>> try { >>>> ResultSet results = qexec.execSelect(); >>>> for ( ; results.hasNext() ; ) >>>> { >>>> QuerySolution soln = results.nextSolution() ; >>>> String x = soln.get("Concept").toString(); >>>> System.out.print(x +"\n"); >>>> } >>>> } >>>> >>>> finally { >>>> System.out.println("closing!"); >>>> qexec.close() ; >>>> } >>>> >>>> >>>> This will result in the following error: >>>> >>>> >>>> [com.ctc.wstx.exc.WstxLazyException] >>>> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected >>>> character '<' >>>> (code 60); expected a semi-colon after the reference for entity >>>> 'MelindaGatesFoundationPeople' >>>> at [row,col {unknown-source}]: [2609,96] >>>> at >>>> >>> com >>> .ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4 >>>> 5) >>>> at >>>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java: >>>> 671) >>>> at >>>> >>> com >>> .ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav >>>> a:3505) >>>> at >>>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java: >>>> 804) >>>> at >>>> >>> com >>> .ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java >>>> :674) >>>> at >>>> >>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX >>> $ResultSetStAX.getOneSolut >>>> ion(XMLIn\ >>>> putStAX.java:472) >>>> at >>>> >>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX >>> $ResultSetStAX.hasNext(XML >>>> InputStAX\ >>>> .java:213) >>>> >>>> >>>> >>>> I also posted this on the Jena group but some seem to suggest it >>>> is a >>>> dbpedia issue: http://tech.groups.yahoo.com/group/jena- >>>> dev/message/36210 >>>> >>>> >>>> >>>> >>>> >>>> >>> ----------------------------------------------------------------------- >>>> -- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge >>>> Build the coolest Linux based applications with Moblin SDK & win >>>> great >>>> prizes >>>> Grand prize is a trip for two to an Open Source event anywhere in >>>> the >>>> world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ >>>> Dbpedia-discussion mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Dbpedia-discussion mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> > All, > > Fixed. > > Please verify. > > > -- > > > Regards, > > Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen > President & CEO OpenLink Software Web: http://www.openlinksw.com > > > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
