Re: Protocol extensions for federated querying

Andreas Langegger Wed, 21 Oct 2009 02:10:31 -0700

Hi Paul,

+1 - would like to see that in SPARQL/Query1.1 also

However, I think it would be more convenient, compact and also requireless markup if initial bindings can be submitted as part of the queryand not in the post attachment. Small queries could still be issuedvia GET and if there are many bindings, the client just can use POSTanyway.


I have implemented a BINDINGS extension in ARQ, demo running at
http://ramses.faw.uni-linz.ac.at:8900/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fs+a+%3Ftype%0D%0A%7D+BINDINGS+%3Ftype+%7B%0D%0A++bsbm%3AProduct+.%0D%0A++foaf%3APerson+.%0D%0A%7D

Example with multiple variables (empty bindings may be specified with"null"):

SELECT * WHERE {
   ?s :p ?a ; :p ?b ...
} BINDINGS ?a ?b {
   bsbm:Product "34"^^xsd:int .
   null "23"^^xsd:int .
   foaf:Person . // remaining slots are interpreted as empty (null)
}

The evaluation is simply a Join in ARQ against a OpTable which is thematerialized solutions supplied. Very simple to implement actually andworth having it in future SPARQL.

For scalable federation over public SPARQL endpoints I'm however morethan sceptical since I've done much research and experiments towardsthis direction. My SemWIQ [1] mediator is working with patchedendpoints only that support SPARQL BINDINGS and RDFStats [2]. IssuingCOUNT queries before may not scale well. Initial bindings mainlyreduce the latency times for HTTP connections, but it does onlylinearly speed up federation. If there are many distributed joins,even bind joins (dynamic optimization by substitution) becomestroublesome...


Regards,
Andy

[1] http://semwiq.sourceforge.net
[2] http://rdfstats.sourceforge.net


On Oct 20, 2009, at 9:51 PM, Paul Gearon wrote:

Hi everyone,

This meets the commitment I made for ACTION-124.

So far, all the comments I've seen on federated queries have been
about the suggested query syntax. To date I'm in agreement with what
I've seen proposed.

I am also interested in extending the protocol to support federation a
little better. At the moment, all queries are done as a simple request
via a GET or a POST. In the case of POST, the endpoint alone is
provided in the URL, and the query appears in the body.

I'd like to see a form of POST that includes a SPARQL variable binding
result in the body (a la http://www.w3.org/TR/rdf-sparql-XMLres/). In
this way the receiving query engine can work with prebindings that are
provided to it, allowing it to reduce the result that is to be
streamed back to the calling engine.

To give an example, I'll reference the two datasets found in 8.3 of
the SPARQL Query Language document:
 http://www.w3.org/TR/rdf-sparql-query/#queryDataset

If we make the presumption that the named graph
http://example.org/foaf/aliceFoaf can be found at
http://sparql.org/sparql/, then I might want to issue the following
query to get the names of people whose nicknames are in the bobFoaf
graph:

SELECT ?nick ?name
FROM <http://example.org/foaf/bobFoaf>
WHERE {
 ?p1 foaf:nick ?nick .
 ?p1 foaf:mbox ?mbox
 SERVICE <http://sparql.org/sparql/> {
   SELECT ?mbox ?name
   FROM <http://example.org/foaf/aliceFoaf>
   WHERE { ?p2 foaf:mbox ?mbox . ?p2 foaf:name ?name }
 }
}

The part of the query in the SERVICE block would usually return thefollowing:

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
 <head>
   <variable name="mbox"/>
   <variable name="name"/>
 </head>
 <results>
   <result>

<binding name="mbox"><uri>mailto:[email protected]</uri></binding>

     <binding name="name"><literal>Alice</literal></binding>
   </result>
   <result>
     <binding name="mbox"><uri>mailto:[email protected]</uri></binding>
     <binding name="name"><literal>Bob</literal></binding>
   </result>
 </results>
</sparql>

Note that this is information for both Bob and Alice. This can then be
joined to the remainder of the query, which reduces the results to
just Bob.

However, a query engine may instead want to evaluate Bob first. This
may be desirable if some COUNT queries have already been issued, and
the query engine knows that the results of the SERVICE block will
return a large number of results, while the local data would bind
?mbox to only a few values. In that case, the local binding of ?mbox
could be sent along with the query (?p1 and ?nick are not necessary
for the remote service). This could be accomplished using a POST that
has the query in the URL, and the bindings in the body.

POST /sparql/?query=SELECT+%3Fmbox+%3Fname+FROM+%3Chttp%3A%2F%2Fexample.org%2Ffoaf%2FaliceFoaf%3E+WHERE+%7B+%3Fp2+foaf%3Ambox+%3Fmbox+.+%3Fp2+foaf%3Aname+%3Fname+%7D

HTTP/1.1
Content-Length: xxxxxx

Content-Type: multipart/form-data;boundary=ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL

Host: sparql.org
Connection: Keep-Alive
User-Agent: example

--ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL
Content-Disposition: form-data; name="query-prebinding"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
 <head>
   <variable name="mbox"/>
 </head>
 <results>
   <result>
     <binding name="mbox"><uri>mailto:[email protected]</uri></binding>
   </result>
 </results>
</sparql>

--ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL--

With this pre-binding, the remote query engine is able to reduce it's
results to just the one for Bob, thereby cutting the returned size
down by nearly half.

One potential issue is for very long queries that also want to be
placed into the body of a POST. In that case we could simply define
the names of each section (in the example above I've used a name of
"query-prebinding").

What do others think? Does this proposal have merit?

Regards,
Paul Gearon



http://www.langegger.at
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
FAW - Institute for Application-oriented Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69

Re: Protocol extensions for federated querying

Reply via email to