On 02/04/12 11:41, Claude Warren wrote:
The query that returns the error is:

select * where {
SERVICE<http://s4.semanticscience.org:12027/sparql>  {
?coFactor<http://bio2rdf.org/ns/bio2rdf#synonym>  ?syn
}
}

OpenLink Virtuoso.

This has been known to generate illegal XML (let alone SPARQL results format). In this case: line 809:

 <result>
<binding name="coFactor"><uri>http://bio2rdf.org/mgi:88600</uri></binding>
   <binding name="syn"><literal>16&#7;lphaoh-a</literal></binding>
  </result>

<literal>16&#7;lphaoh-a</literal> is illegal XML.

Must be two numbers after &#. This crashes the XML parser in Java7 (which is derived from Xerces).

        Andy




On Sun, Apr 1, 2012 at 6:48 PM, Andy Seaborne<[email protected]>  wrote:

On 01/04/12 09:01, Claude Warren wrote:


On 30/03/12 16:52, Claude Warren wrote:

I have a case where I am using multiple federated calls where each call

is

of the form
Service silent<uri>   {
--snip--
}

One of the endpoints is returning bad data in that the XML does not

parse

and so the XML parser throws an exception and my entire query dies.

Now the best answer would be to get the data corected but I don't own

that

data and have no idea if they will fix it.

What I want to know is shouldn't the "Silent" keyword on the Service

call

indicate that if the remote fails it should be ignored.

   From 
http://www.w3.org/2009/sparql/**docs/fed/service#**serviceFailure<http://www.w3.org/2009/sparql/docs/fed/service#serviceFailure>it
appears that a single solution with no bindings should be returned. If
this is a correct interpretation I am willing to report a bug and

implement

a bug fix. The issue that I see is that the error is not detected until

a

hasNext() is called on the iterator. This means that the service could
have returned some data before the error was detected. I would propose
that the solution be to have the iterator return "false" at that point

and

move forward with the partial data that was already returned.

Does anyone have a different interpretation of the specification or see

an

issue with the possible solution?

Many thanks,
Claude

Hi Claude,
Hmm - tricky :-)
The key sentence is:
[[
The SILENT keyword indicates that errors encountered while accessing a
remote SPARQL endpoint should be ignored while processing the query.
]]
but HTTP has a bit of an issue here.
Suppose the request is made and "200 OK" is received. That's a contract
that the results are going to be sent and be valid. Bad syntax of
results isn't considered nor are breaks in communications.
The only way the address is for the service operations (class
QueryIterService) to consume and buffer all the results. I've just
added this in QueryIterService.
An effect of this is that you will not get any valid earlier results;
which is what you propose and quite sensible.
There needs to be a QueryIterator implementation that reads another
QueryIterator until some error occurs and signal end at that point.
That would be worthwhile - please do contribute such a thing.
Theer's a QueryIteratorWrapper that can be used to intercept
.hasNext/.next calls so you can add try-ctach.
Do you which SPARQL implementation is generating bad results?
Andy


Andy,

I am not certain which sparql endpoint is generating the bad results
-- though I do intend to find out.

I will look at implementing a QueryIterator as you noted above.  I
then need to plug it into the Fuseki query engine chain.


The place to plug it in is in QueryIterService in ARQ (Fuseki is the
protocol engine; ARQ ships with Fuseki).


  Since I am querying multiple Sparql endpoints and performing unions on
their results and since it seems to take quite a long time for some
results I was considering implementing a Union query for service calls
that would effectively poll each service endpoint in turn looking for
the next one that has query result avaiable.  A polling query iterator
if you will.  The hope is to parrallelize the queries as much as
possible.


There as a discussion of this recently on this list.

ARQ is rather prone to serial execution (parallelism in Fuseki is used to
execute multiple concurrent requests, not to give all system resources for
one query).  There's nothing fundamental about ARQ's serial execution of
UNIONs - a different implementation of execution or a different operator
could make parallel SERVICE calls.

        Andy



But I should probably open another discussion for that topic.

Claude






Reply via email to