Georgi,
Thanks for the reply!
The problem is that loading dbpedia in an RDF store takes close to 40 hours 
(some RDF stores will even break), therefore I am using the DBPedia virtuoso 
server for now.
Can you think of another solution?
Thanks again,
Marv


--- On Tue, 8/19/08, Georgi Kobilarov <[EMAIL PROTECTED]> wrote:

> From: Georgi Kobilarov <[EMAIL PROTECTED]>
> Subject: RE: [Dbpedia-discussion] Ampersand in dbpedia returned URI 
> breakingJena code
> To: [EMAIL PROTECTED], [email protected]
> Date: Tuesday, August 19, 2008, 5:16 PM
> Marvin,
> 
> yes, it's a bug in our dataset. In particular in the
> Yago dataset, which
> has been contributed externally and wasn't created with
> the DBpedia
> framework (but hey, we've got many similar bugs in
> datasets created by
> our framework ;))
> 
> Yago URIs have not been url-encoded. So as a workaround,
> you can
> url_encode all URIs starting with
> http://dbpedia.org/class/yago/ in the
> yago_en.nt file before loading it into your Jena model.
> That should do
> it.
> 
> And we'll fix that bug for the future.
> 
> Best,
> Georgi
> 
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> [mailto:dbpedia-
> > [EMAIL PROTECTED] On Behalf Of
> Marvin Lugair
> > Sent: Wednesday, August 20, 2008 12:57 AM
> > To: [email protected]
> > Subject: [Dbpedia-discussion] Ampersand in dbpedia
> returned URI
> > breakingJena code
> > 
> > 
> > Hi,
> > 
> > The following sparql query:
> > select distinct ?Concept where {[] a ?Concept
> > 
> > Is the default query at the dbpedia endpoint
> http://dbpedia.org/sparql
> > It returns several URI's including the following
> one (notice the and
> > sign):
> > 
> >
> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople
> > 
> > So DBPedia is returning URI's containing an
> ampersand. This is causing
> > an exception in the Jena parser.
> > 
> > How do I fix this? None of Jenas methods will work, I
> cant transofrm
> > the resultset into a model or even print is with the
> resultformatter.
> > If i iterate over it, I can print the results one by
> one till I get to
> > the malformed URI. How do I check in my code for
> malformed URI's?
> > 
> > 
> > Any ideas?
> > Thanks!
> > Marv
> > -------------
> > 
> > The code below works till i get a URI with an
> ampersand.
> > The exception is coming from results.nextSolution().
> Other Jena
> > methods to convert the retrieved resultset to a model
> directly or
> > format it produce the same exception (I assume they
> have a similar
> > iterator inside)
> > 
> > 
> > QueryExecution qexec =
> >
> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql";,
> > "select distinct ?Concept where {[] a
> ?Concept}");
> > 
> > try {
> > ResultSet results = qexec.execSelect();
> > for ( ; results.hasNext() ; )
> > {
> > QuerySolution soln = results.nextSolution() ;
> > String x = soln.get("Concept").toString();
> > System.out.print(x +"\n");
> > }
> > }
> > 
> > finally {
> > System.out.println("closing!");
> > qexec.close() ;
> > }
> > 
> > 
> > This will result in the following error:
> > 
> > 
> > [com.ctc.wstx.exc.WstxLazyException]
> > com.ctc.wstx.exc.WstxUnexpectedCharException:
> Unexpected character '<'
> > (code 60); expected a semi-colon after the reference
> for entity
> > 'MelindaGatesFoundationPeople'
> > at [row,col {unknown-source}]: [2609,96]
> > at
> >
> com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4
> > 5)
> > at
> >
> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav
> > a:3505)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java
> > :674)
> > at
> >
> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolut
> > ion(XMLIn\
> > putStAX.java:472)
> > at
> >
> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XML
> > InputStAX\
> > .java:213)
> > 
> > 
> > 
> > I also posted this on the Jena group but some seem to
> suggest it is a
> > dbpedia issue:
> http://tech.groups.yahoo.com/group/jena-
> > dev/message/36210
> > 
> > 
> > 
> > 
> > 
> >
> -----------------------------------------------------------------------
> > --
> > This SF.Net email is sponsored by the Moblin Your Move
> Developer's
> > challenge
> > Build the coolest Linux based applications with Moblin
> SDK & win great
> > prizes
> > Grand prize is a trip for two to an Open Source event
> anywhere in the
> > world
> >
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Dbpedia-discussion mailing list
> > [email protected]
> >
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


      

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to