Hello Ruslan,
Thanks for such an exhaustive answer! For now, we can just avoid using
those optional property path elements in our Sparql queries and convert
them to use unions instead - it seems to work properly for us and
shouldn't be very time-consuming. This is still just a workaround though
and it would be really nice if Sesame developers fixed this.
--
Best regards,
Krzysztof Sielski
Poznan Supercomputing and Networking Center
W dniu 2012-06-18 15:53, Ruslan Velkov pisze:
Hi Krzysztof,
Many thanks for reporting this and for providing a test class!
What we introduced in 5.1 is the Sesame's QueryJoinOptimizer which
rearranges joins so that if there are sub-select clauses in the query
they will be evaluated first and in the best possible order (in terms
of number of variables shared by their respective projections). This
optimizer allows for fast and efficient evaluation of nested SELECT
clauses.
What I saw as a by-product of this optimizer on q2 was this:
[Original query plan without applying the QueryJoinOptimizer]
Projection
ProjectionElemList
ProjectionElem "name"
Join
Join
StatementPattern
Var (name=-const-1, value=person://1, anonymous)
Var (name=-const-2, value=http://xmlns.com/foaf/0.1/knows,
anonymous)
Var (name=-const-2-0, anonymous)
Union
ZeroLengthPath
Var (name=-const-2-0, anonymous)
Var (name=-const-3-1, anonymous)
StatementPattern
Var (name=-const-2-0, anonymous)
Var (name=-const-3,
value=http://xmlns.com/foaf/0.1/knows, anonymous)
Var (name=-const-3-1, anonymous)
StatementPattern
Var (name=-const-3-1, anonymous)
Var (name=-const-4, value=http://xmlns.com/foaf/0.1/name,
anonymous)
Var (name=name)
[Query plan after applying the QueryJoinOptimizer]
Projection
ProjectionElemList
ProjectionElem "name"
Join
StatementPattern
Var (name=-const-1, value=person://1, anonymous)
Var (name=-const-2, value=http://xmlns.com/foaf/0.1/knows,
anonymous)
Var (name=-const-2-0, anonymous)
Join
StatementPattern
Var (name=-const-3-1, anonymous)
Var (name=-const-4, value=http://xmlns.com/foaf/0.1/name,
anonymous)
Var (name=name)
Union
ZeroLengthPath
Var (name=-const-2-0, anonymous)
Var (name=-const-3-1, anonymous)
StatementPattern
Var (name=-const-2-0, anonymous)
Var (name=-const-3,
value=http://xmlns.com/foaf/0.1/knows, anonymous)
Var (name=-const-3-1, anonymous)
As you can see, using the first query model we'll evaluate <person://1
http://xmlns.com/foaf/0.1/knows -const-2-0> and then <-const-2-0
http://xmlns.com/foaf/0.1/knows -const-3-1> with the already bound
-const-2-0 and finally we'll evaluate the last statement using the
binding for -const-3-1. This is the correct ordering. What we can see
from the second query model is evaluating <person://1
http://xmlns.com/foaf/0.1/knows -const-2-0> and then the last
statement <-const-3-1 http://xmlns.com/foaf/0.1/name name> and then
the optional second statement, but as far as the the first two
patterns in this order don't share any variables a Cartesian product
will be formed (the first pattern has 999 results and the second one
has 23334 results, hence 23,310,666 iterations, very few of which will
succeed).
Unfortunately, there is no way to turn this optimizer off and even
there was one, other queries would become much slower (namely the ones
with sub-selects).
There can be introduced a parameter that switches the optimizer off as
a workaround, but you may encounter problems when using queries with
sub-selects, so in that case you should arrange the sub-selects
manually and they should be the first thing to a appear in a query (in
case you use such queries along the ones with problematic property
path evaluation). Another workaround could be using an ASK query to
switch that optimizer on/off at runtime, but this will be a rather
clumsy approach (queries can be evaluated from multiple threads
asynchronously, so you won't have guarantee when exactly you use the
optimizer and when not). The real solution is a fix in Sesame to be
provided (we'll communicate the issue with the Sesame guys).
So the fastest solution will be to provide a parameter which will
statically forbid the optimizer (i.e. at initialization time). Will
that be ok in your case?
Hth,
Ruslan
On 06/18/2012 02:55 PM, Krzysztof Sielski wrote:
Hello,
We noticed that after migrating to Owlim SE 5.1.5183 our queries
which use property paths with optional elements are evaluated very
slowly (in contrast to previous releases). Their direct equivalents
using UNION would return the same results much faster. This is an
example:
[query for particular person's friends' names and their friends' names]
(q1)
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
select * WHERE {
{<person://1> foaf:knows/foaf:name ?name}
UNION
{<person://1> foaf:knows/foaf:knows/foaf:name ?name}
}
(q2)
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
select * WHERE {
<person://1> foaf:knows/foaf:knows?/foaf:name ?name
}
Both queries seem to be equivalent and we used (q2) as it is more
concise and elegant but now (q1) is much faster:
Executing query q1
Result count: 2 in 0,006000s.
Executing query q2
Result count: 2 in 7,034000s.
As before, I attached a simple class that creates a local repository,
inserts some data and executes the queries to show you the problem.
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion