On 31/01/12 13:10, Damian Steer wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 31/01/12 12:49, Ozga, Rafal wrote:
Hi,
Hi Rafal,
We¹re trying to build a caching layer in the front of our
triplestore server. But to get the most out of such a cache, it¹s
quite important to normalize incoming SPARQL queries...
Indeed, although don't underestimate the power of a very simple minded
cache. In some of my work we did no normalisation and it was still
very effective.
+1
It can be surprising just how many identical queries arrive.
There is a (prototype) SPARQL cache that does paging by stripping off
LIMIT/OFFSET, issuing the whole (sorted) query and caching the results.
Then, later requests with different LIMIT/OFFSET are sliced out of the
cached results.
We thought that the best way of doing so would be to use the
feature that ARQ has in store, namely SPARQL Algebra.
+1
The problem with that approach is, however, that the algebras being
produced by Algebra.compile(query) routine are not ³abstract
enough² for our purpose. For the former queries we would get
respectively:
(bgp (triple ?subject ?predicate ?object))
(bgp (triple ?s ?p ?o))
Yes, Algebra.compile in itself does no work on the form of the
operations as far as I'm aware. However that's only the beginning of
the process. ARQ (and individual query engines) spend a lot of their
time transforming algebras.
Caveat: those are not quite the same query unless something else is
going on.
If there are SELECT queries, then the variable names of the column are
different.
If it's a CONSTRUCT or DESCRIBE query, then, depending on the template
for CONSTRUCT, they can be the same.
So the question is whether we are missing something and there is
indeed a general way of obtaining the algebras that wouldn¹t be any
different for semantically identical queries (like those two above)
or we have to resort to some built in-house tweaks and tricks (e.g.
normalizing variable names, sorting algebra nodes on the same tree
level alphabetically etc. etc.) ?
There might well be transformers doing some of this work (ISTR a
variable reassignment mechanism) -- Andy would know best. I wrote a
little piece on manipulating sparql [1] which shows a simple transformer.
Yes.
Every Op has an .equalTo method. This is not .equals().
Op.equalTo(Op other, NodeIsomorphismMap labelMap)
Op.equals is implemented as a call of .equalTo with a bNode mapping
isomorphism map.
You could write your own NodeIsomorphismMap (inherit) and change it to
be variable name mapping as well. Then you would get the isomorphic
equality test you need to do this.
If there is an isomorphism mapping, then the code will return true and
what is more, the map will give the association of names so you can
remap the variables in the result as well.
The algorithm is linear in the size of the algebra expression because
expressions are trees so it's walking two trees to see if one can be
isomorphically mapped to the other.
Andy
Damian
[1]
<http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8n6FAACgkQAyLCB+mTtylfgwCg8td99GDeryaPqrJpIX2PTzhQ
lcYAoPGSMFc038apOpz1bSjbGlOs+pbk
=Uvwc
-----END PGP SIGNATURE-----