Seaborne, Andy wrote: > >> -----Original Message----- >> From: [EMAIL PROTECTED] [mailto:dbpedia- >> [EMAIL PROTECTED] On Behalf Of Georgi Kobilarov >> Sent: 20 August 2008 00:17 >> To: [EMAIL PROTECTED]; [email protected] >> Subject: Re: [Dbpedia-discussion] Ampersand in dbpedia returned URI >> breakingJena code >> >> Marvin, >> >> yes, it's a bug in our dataset. In particular in the Yago dataset, which >> has been contributed externally and wasn't created with the DBpedia >> framework (but hey, we've got many similar bugs in datasets created by >> our framework ;)) >> >> Yago URIs have not been url-encoded. So as a workaround, you can >> url_encode all URIs starting with http://dbpedia.org/class/yago/ in the >> yago_en.nt file before loading it into your Jena model. That should do >> it. >> >> And we'll fix that bug for the future. >> >> Best, >> Georgi >> > > > Fixing the dataset will be a great help. This is the second report I have > received recently but both are actually related to the XML, not RDF. > > There are two things here: use of the & in the URI (from the data as you say) > but also the DBPedia endpoint is emitting illegal XML. It's the second that > is the cause of the exception. > > For the query : > > select distinct ?Concept where {[] a ?Concept} > > The OP got: > > <result> > <binding > name="Concept"><uri>http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople</uri></binding> > </result> > > which uses & in XML and it should be & so the XML is bad at the entity > level. That's what is breaking the StAX parser in the stacktrace not the bad > URI. It didn't get as far as knowing it was a URI! > I'd guess that a legal use of & in a URL will also cause problems. > > The SPARQL endpoint needs fixing as well. I though this had been fixed - is > it just a case of upgrade or is still broken? > > Andy et al,
We'll look into this. Yrjänä > Andy > > > >> -- >> Georgi Kobilarov >> Freie Universität Berlin >> www.georgikobilarov.com >> >> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] >>> >> [mailto:dbpedia- >> >>> [EMAIL PROTECTED] On Behalf Of Marvin Lugair >>> Sent: Wednesday, August 20, 2008 12:57 AM >>> To: [email protected] >>> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI >>> breakingJena code >>> >>> >>> Hi, >>> >>> The following sparql query: >>> select distinct ?Concept where {[] a ?Concept >>> >>> Is the default query at the dbpedia endpoint http://dbpedia.org/sparql >>> It returns several URI's including the following one (notice the and >>> sign): >>> >>> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople >>> >>> So DBPedia is returning URI's containing an ampersand. This is causing >>> an exception in the Jena parser. >>> >>> How do I fix this? None of Jenas methods will work, I cant transofrm >>> the resultset into a model or even print is with the resultformatter. >>> If i iterate over it, I can print the results one by one till I get to >>> the malformed URI. How do I check in my code for malformed URI's? >>> >>> >>> Any ideas? >>> Thanks! >>> Marv >>> ------------- >>> >>> The code below works till i get a URI with an ampersand. >>> The exception is coming from results.nextSolution(). Other Jena >>> methods to convert the retrieved resultset to a model directly or >>> format it produce the same exception (I assume they have a similar >>> iterator inside) >>> >>> >>> QueryExecution qexec = >>> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", >>> "select distinct ?Concept where {[] a ?Concept}"); >>> >>> try { >>> ResultSet results = qexec.execSelect(); >>> for ( ; results.hasNext() ; ) >>> { >>> QuerySolution soln = results.nextSolution() ; >>> String x = soln.get("Concept").toString(); >>> System.out.print(x +"\n"); >>> } >>> } >>> >>> finally { >>> System.out.println("closing!"); >>> qexec.close() ; >>> } >>> >>> >>> This will result in the following error: >>> >>> >>> [com.ctc.wstx.exc.WstxLazyException] >>> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '<' >>> (code 60); expected a semi-colon after the reference for entity >>> 'MelindaGatesFoundationPeople' >>> at [row,col {unknown-source}]: [2609,96] >>> at >>> >>> >> com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4 >> >>> 5) >>> at >>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671) >>> at >>> >>> >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav >> >>> a:3505) >>> at >>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804) >>> at >>> >>> >> com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java >> >>> :674) >>> at >>> >>> >> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolut >> >>> ion(XMLIn\ >>> putStAX.java:472) >>> at >>> >>> >> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XML >> >>> InputStAX\ >>> .java:213) >>> >>> >>> >>> I also posted this on the Jena group but some seem to suggest it is a >>> dbpedia issue: http://tech.groups.yahoo.com/group/jena- >>> dev/message/36210 >>> >>> >>> >>> >>> >>> >>> >> ----------------------------------------------------------------------- >> >>> -- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the >>> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Dbpedia-discussion mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > -- Yrjana Rankka | [EMAIL PROTECTED] Developer, Virtuoso Team | http://www.openlinksw.com | Making Technology Work For You ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
