Kingsley, et al Yes seems fixed, Jena code works correctly now on the query (without a filter). So all is fine! Thanks to everyone for the prompt responses!
By the way, what was the fix? Is it something you changed in Virtuoso? Just curious. Marv --- On Wed, 8/20/08, Richard Cyganiak <[EMAIL PROTECTED]> wrote: > From: Richard Cyganiak <[EMAIL PROTECTED]> > Subject: Re: [Dbpedia-discussion] Ampersand in dbpedia returned URI > breakingJena code > To: "Kingsley Idehen" <[EMAIL PROTECTED]> > Cc: "Georgi Kobilarov" <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [email protected], "Seaborne, Andy" <[EMAIL PROTECTED]> > Date: Wednesday, August 20, 2008, 3:14 PM > Kingsley, from looking at the query result it seems like the > issue is > fixed. Thanks! > > Confirmation from someone who uses Jena to access the > SPARQL endpoint > would be nice. > > Richard > > > > On 20 Aug 2008, at 19:30, Kingsley Idehen wrote: > > > Richard Cyganiak wrote: > >> Marvin, Kingsley, > >> > >> On 20 Aug 2008, at 00:16, Georgi Kobilarov wrote: > >>> yes, it's a bug in our dataset. > >> > >> Actually, no. It's a bug in Virtuoso's > SPARQL+XML result format > >> serializer. > >> > >> Ampersands are allowed in URIs, so the Yago URIs > are perfectly fine > >> according to all the specs. (We *might* still want > to %-encode the > >> ampersand in those URIs, but just for consistency > with our other > >> URIs, not because the specs require it. That's > a separate question.) > >> > >> The problem is: When a "&" character > is included in content inside > >> an XML file, it has to be written as > "&". Virtuoso does not do > >> this, hence the breakage. > >> > >> (This is a silly bug. The need to encode reserved > characters (& and > >> ") is just about the first thing a developer > learns about XML. I > >> hope OpenLink fixes this soon. Kingsley?) > >> > >> Richard > >> > >> > >> > >>> In particular in the Yago dataset, which > >>> has been contributed externally and wasn't > created with the DBpedia > >>> framework (but hey, we've got many similar > bugs in datasets > >>> created by > >>> our framework ;)) > >>> > >>> Yago URIs have not been url-encoded. So as a > workaround, you can > >>> url_encode all URIs starting with > http://dbpedia.org/class/yago/ > >>> in the > >>> yago_en.nt file before loading it into your > Jena model. That > >>> should do > >>> it. > >>> > >>> And we'll fix that bug for the future. > >>> > >>> Best, > >>> Georgi > >>> > >>> -- > >>> Georgi Kobilarov > >>> Freie Universität Berlin > >>> www.georgikobilarov.com > >>> > >>>> -----Original Message----- > >>>> From: > [EMAIL PROTECTED] > >>> [mailto:dbpedia- > >>>> [EMAIL PROTECTED] > On Behalf Of Marvin > >>>> Lugair > >>>> Sent: Wednesday, August 20, 2008 12:57 AM > >>>> To: > [email protected] > >>>> Subject: [Dbpedia-discussion] Ampersand in > dbpedia returned URI > >>>> breakingJena code > >>>> > >>>> > >>>> Hi, > >>>> > >>>> The following sparql query: > >>>> select distinct ?Concept where {[] a > ?Concept > >>>> > >>>> Is the default query at the dbpedia > endpoint http://dbpedia.org/sparql > >>>> It returns several URI's including the > following one (notice the > >>>> and > >>>> sign): > >>>> > >>>> > http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople > >>>> > >>>> So DBPedia is returning URI's > containing an ampersand. This is > >>>> causing > >>>> an exception in the Jena parser. > >>>> > >>>> How do I fix this? None of Jenas methods > will work, I cant > >>>> transofrm > >>>> the resultset into a model or even print > is with the > >>>> resultformatter. > >>>> If i iterate over it, I can print the > results one by one till I > >>>> get to > >>>> the malformed URI. How do I check in my > code for malformed URI's? > >>>> > >>>> > >>>> Any ideas? > >>>> Thanks! > >>>> Marv > >>>> ------------- > >>>> > >>>> The code below works till i get a URI with > an ampersand. > >>>> The exception is coming from > results.nextSolution(). Other Jena > >>>> methods to convert the retrieved resultset > to a model directly or > >>>> format it produce the same exception (I > assume they have a similar > >>>> iterator inside) > >>>> > >>>> > >>>> QueryExecution qexec = > >>>> > QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", > >>>> "select distinct ?Concept where {[] a > ?Concept}"); > >>>> > >>>> try { > >>>> ResultSet results = qexec.execSelect(); > >>>> for ( ; results.hasNext() ; ) > >>>> { > >>>> QuerySolution soln = > results.nextSolution() ; > >>>> String x = > soln.get("Concept").toString(); > >>>> System.out.print(x +"\n"); > >>>> } > >>>> } > >>>> > >>>> finally { > >>>> System.out.println("closing!"); > >>>> qexec.close() ; > >>>> } > >>>> > >>>> > >>>> This will result in the following error: > >>>> > >>>> > >>>> [com.ctc.wstx.exc.WstxLazyException] > >>>> > com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected > >>>> character '<' > >>>> (code 60); expected a semi-colon after the > reference for entity > >>>> 'MelindaGatesFoundationPeople' > >>>> at [row,col {unknown-source}]: [2609,96] > >>>> at > >>>> > >>> com > >>> > .ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4 > >>>> 5) > >>>> at > >>>> > com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java: > > >>>> 671) > >>>> at > >>>> > >>> com > >>> > .ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav > >>>> a:3505) > >>>> at > >>>> > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java: > > >>>> 804) > >>>> at > >>>> > >>> com > >>> > .ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java > >>>> :674) > >>>> at > >>>> > >>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX > >>> $ResultSetStAX.getOneSolut > >>>> ion(XMLIn\ > >>>> putStAX.java:472) > >>>> at > >>>> > >>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX > >>> $ResultSetStAX.hasNext(XML > >>>> InputStAX\ > >>>> .java:213) > >>>> > >>>> > >>>> > >>>> I also posted this on the Jena group but > some seem to suggest it > >>>> is a > >>>> dbpedia issue: > http://tech.groups.yahoo.com/group/jena- > >>>> dev/message/36210 > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > ----------------------------------------------------------------------- > >>>> -- > >>>> This SF.Net email is sponsored by the > Moblin Your Move Developer's > >>>> challenge > >>>> Build the coolest Linux based applications > with Moblin SDK & win > >>>> great > >>>> prizes > >>>> Grand prize is a trip for two to an Open > Source event anywhere in > >>>> the > >>>> world > >>>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>>> > _______________________________________________ > >>>> Dbpedia-discussion mailing list > >>>> [email protected] > >>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > >>> > >>> > ------------------------------------------------------------------------- > >>> This SF.Net email is sponsored by the Moblin > Your Move Developer's > >>> challenge > >>> Build the coolest Linux based applications > with Moblin SDK & win > >>> great prizes > >>> Grand prize is a trip for two to an Open > Source event anywhere in > >>> the world > >>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>> > _______________________________________________ > >>> Dbpedia-discussion mailing list > >>> [email protected] > >>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > >> > >> > > All, > > > > Fixed. > > > > Please verify. > > > > > > -- > > > > > > Regards, > > > > Kingsley Idehen Weblog: > http://www.openlinksw.com/blog/~kidehen > > President & CEO OpenLink Software Web: > http://www.openlinksw.com > > > > > > > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
