Seaborne, Andy wrote:
>   
>> -----Original Message-----
>> From: [EMAIL PROTECTED] [mailto:dbpedia-
>> [EMAIL PROTECTED] On Behalf Of Georgi Kobilarov
>> Sent: 20 August 2008 00:17
>> To: [EMAIL PROTECTED]; [email protected]
>> Subject: Re: [Dbpedia-discussion] Ampersand in dbpedia returned URI
>> breakingJena code
>>
>> Marvin,
>>
>> yes, it's a bug in our dataset. In particular in the Yago dataset, which
>> has been contributed externally and wasn't created with the DBpedia
>> framework (but hey, we've got many similar bugs in datasets created by
>> our framework ;))
>>
>> Yago URIs have not been url-encoded. So as a workaround, you can
>> url_encode all URIs starting with http://dbpedia.org/class/yago/ in the
>> yago_en.nt file before loading it into your Jena model. That should do
>> it.
>>
>> And we'll fix that bug for the future.
>>
>> Best,
>> Georgi
>>     
>
>
> Fixing the dataset will be a great help.  This is the second report I have 
> received recently but both are actually related to the XML, not RDF.
>
> There are two things here: use of the & in the URI (from the data as you say) 
> but also the DBPedia endpoint is emitting illegal XML.  It's the second that 
> is the cause of the exception.
>
> For the query :
>
> select distinct ?Concept where {[] a ?Concept}
>
> The OP got:
>
> <result>
>    <binding 
> name="Concept"><uri>http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople</uri></binding>
>   </result>
>
> which uses & in XML and it should be &amp; so the XML is bad at the entity 
> level.  That's what is breaking the StAX parser in the stacktrace not the bad 
> URI.  It didn't get as far as knowing it was a URI!
> I'd guess that a legal use of & in a URL will also cause problems.
>
> The SPARQL endpoint needs fixing as well.  I though this had been fixed - is 
> it just a case of upgrade or is still broken?
>
>   
Andy et al,

We'll look into this.

Yrjänä

>         Andy
>
>
>   
>> --
>> Georgi Kobilarov
>> Freie Universität Berlin
>> www.georgikobilarov.com
>>
>>     
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED]
>>>       
>> [mailto:dbpedia-
>>     
>>> [EMAIL PROTECTED] On Behalf Of Marvin Lugair
>>> Sent: Wednesday, August 20, 2008 12:57 AM
>>> To: [email protected]
>>> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI
>>> breakingJena code
>>>
>>>
>>> Hi,
>>>
>>> The following sparql query:
>>> select distinct ?Concept where {[] a ?Concept
>>>
>>> Is the default query at the dbpedia endpoint http://dbpedia.org/sparql
>>> It returns several URI's including the following one (notice the and
>>> sign):
>>>
>>> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople
>>>
>>> So DBPedia is returning URI's containing an ampersand. This is causing
>>> an exception in the Jena parser.
>>>
>>> How do I fix this? None of Jenas methods will work, I cant transofrm
>>> the resultset into a model or even print is with the resultformatter.
>>> If i iterate over it, I can print the results one by one till I get to
>>> the malformed URI. How do I check in my code for malformed URI's?
>>>
>>>
>>> Any ideas?
>>> Thanks!
>>> Marv
>>> -------------
>>>
>>> The code below works till i get a URI with an ampersand.
>>> The exception is coming from results.nextSolution(). Other Jena
>>> methods to convert the retrieved resultset to a model directly or
>>> format it produce the same exception (I assume they have a similar
>>> iterator inside)
>>>
>>>
>>> QueryExecution qexec =
>>> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql";,
>>> "select distinct ?Concept where {[] a ?Concept}");
>>>
>>> try {
>>> ResultSet results = qexec.execSelect();
>>> for ( ; results.hasNext() ; )
>>> {
>>> QuerySolution soln = results.nextSolution() ;
>>> String x = soln.get("Concept").toString();
>>> System.out.print(x +"\n");
>>> }
>>> }
>>>
>>> finally {
>>> System.out.println("closing!");
>>> qexec.close() ;
>>> }
>>>
>>>
>>> This will result in the following error:
>>>
>>>
>>> [com.ctc.wstx.exc.WstxLazyException]
>>> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '<'
>>> (code 60); expected a semi-colon after the reference for entity
>>> 'MelindaGatesFoundationPeople'
>>> at [row,col {unknown-source}]: [2609,96]
>>> at
>>>
>>>       
>> com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4
>>     
>>> 5)
>>> at
>>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671)
>>> at
>>>
>>>       
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav
>>     
>>> a:3505)
>>> at
>>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804)
>>> at
>>>
>>>       
>> com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java
>>     
>>> :674)
>>> at
>>>
>>>       
>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolut
>>     
>>> ion(XMLIn\
>>> putStAX.java:472)
>>> at
>>>
>>>       
>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XML
>>     
>>> InputStAX\
>>> .java:213)
>>>
>>>
>>>
>>> I also posted this on the Jena group but some seem to suggest it is a
>>> dbpedia issue: http://tech.groups.yahoo.com/group/jena-
>>> dev/message/36210
>>>
>>>
>>>
>>>
>>>
>>>
>>>       
>> -----------------------------------------------------------------------
>>     
>>> --
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the
>>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>       
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>     
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>   


-- 
Yrjana Rankka            | [EMAIL PROTECTED]
Developer, Virtuoso Team | http://www.openlinksw.com
                         | Making Technology Work For You



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to