On Mon, Jan 30, 2012 at 7:08 AM, Andy Seaborne <[email protected]> wrote:
> On 30/01/12 12:52, William Waites wrote:
>>
>> Hello all,
>
>
> Hi William,
>
>
>>
>> My collegue Paolo has been suggesting that I join this list for a
>> while, and since I have a couple of questions stemming from my use of
>> the SPARQL-FED stuff in ARQ, I thought that now might be a good time.
>>
>> What I'm doing is as follows. I have some information about
>> airports. It's accurate and complete, but pretty skeletal. dbpedia on
>> the other hand is less complete but richer in terms of text
>> descriptions and additional information. There also happens to be a
>> text field (ICAO code) that can be used to join the two.
>>
>> Though I know there are ways to do this more efficiently, I think a
>> single CONSTRUCT query with some SERVICE blocks in the WHERE clause is
>> a very clean way to do it, and will only become more efficient as the
>> implementation gets better.
>>
>> So an abbreviated version of the query might be something like,
>>
>>     CONSTRUCT {
>>         ?my_uri dct:description ?description
>>     } WHERE {
>>         ?my_uri transit:icaoCode ?icao.
>>         SERVICE<http://dbpedia.org/sparql>  {
>>             ?dbp_uri dbpprop:icao ?icao;
>>                      rdfs:comment ?description
>>         }
>>     }
>>
>> This mail is about two ways the implementation might get
>> better.
>>
>> Firstly it is brittle. It expands into doing one remote query for each
>> ?icao, which is what one would expect. If any sub-query fails due to
>> transient network events or server flakiness (almost inevitable with
>> more than a trivially small set of things to be queried) the whole
>> query fails. I would rather like the process to continue, and perhaps
>> log a warning. The web is unreliable and the semantic web contains a
>> funny open-world assumption of incomplete results being acceptable,
>> it's just the nature of the beast. Incomplete results are better than
>> no results in this case, but that they are known to be possibly
>> incomplete should be flagged in some way in case the user cares.
>
>
> SERVICE SILENT may be what you are looking for.  Strictly, this is continue
> (with no results) if any part fails but in ARQ, in normal usage, it is
> applied to each service request.
>
> See QueryIterService.
>
>
>>
>> Secondly, I understand from Paolo that the client in ARQ does not use
>> persistent HTTP connections. For iterations like this, the HTTP
>> set-up/tear-down is quite costly and it would be much better if
>> persistent connections were supported here. Possibly even better
>> (potentially the server could take advantage of this, executing
>> queries in parallel for example) if the queries were pipelined to some
>> extent.
>
>
> The real problem is that the correct query to send to the far end is
>
> SELECT *  {
>
>      ?dbp_uri dbpprop:icao ?icao;
>               rdfs:comment ?description
>  } BINDINGS ... fro the first part ...
>
> then it is one request that still does not ask an ungrounded
>
> {
>   ?dbp_uri dbpprop:icao ?icao;
>            rdfs:comment ?description
> }
>
> but DBpedia does not support all of SPARQL 1.1 and in particular it does not
> support BINDINGS (yet?).
>
> The implementation of service requests is in
> com.hp.hpl.jena.sparql.engine.http.HttpQuery.  It might be better to use the
> Apache HTTP client.  Currently it use java.net.
>
> Patches welcome.
>
>
>> doesn't cause the whole thing to fail and lose the work already done.
>> "doesn't consume a lot of RAM"
>
>
> ARQ streams the results out (unless you ask something that can't like
> wanting the text output form - in which case send to a file as a streamable
> format and read the file back in.) .
>
> CONSTRUCT isn't streamable - can you use a SELECT and generate the triples
> for the CONSTRUCT as it streams?
>

I'd been thinking that an additional API that did stream CONSTRUCT
queries might be useful.  It would have to return an Iterator<Triple>
instead of a Model.  This would work well for Fuseki, as it is only
streaming RDF back to the client.  Combined with
org.openjena.atlas.data.DistinctDataNet it would have spill-to-disk
capability as well.

-Stephen

Reply via email to