Hi Ruslan,
we have the similar issue (see
http://www.mail-archive.com/owlim-discussion@ontotext.com/msg01626.html), which
is probably based on the same thing. I don't understand from your response
whether the new behaviour is bug or feature.
I carefully looked at release notes and nothing so serious as total change
query optimizer, which dramatically changes behaviour of subselects, was not
mentioned! I see(and really appreciate) that you are looking for shorttime
hotfix solution introducing some parameter but for now I don't know what to do.
Lot of our bussiness code uses subselects as it is powerful feature. We rely on
your answer whether this feature/bug is just problem of few days or it is
permanent state. The parameter is great as hotfix, but I can't test and
optimize all queries and try in which case it runs faster and after some
"minor" bugfix suddenly without any warning I can start from begining.
As it is not well documented, I would like to ask, what are positive aspects of
using new query optimizer? Where can we see improvements? What kind of queries
should ran faster?
Thank you for your time and looking forward for your response.
Best regards,
Marek
________________________________
From: Ruslan Velkov <rus...@sirma.bg>
To: owlim-discussion@ontotext.com
Sent: Monday, 18 June 2012, 15:53
Subject: Re: [Owlim-discussion] Poor performance for Sparql queries with
property path optional elements
Hi Krzysztof,
Many thanks for reporting this and for providing a test class!
What we introduced in 5.1 is the Sesame's QueryJoinOptimizer which
rearranges joins so that if there are sub-select clauses in the
query they will be evaluated first and in the best possible order
(in terms of number of variables shared by their respective
projections). This optimizer allows for fast and efficient
evaluation of nested SELECT clauses.
What I saw as a by-product of this optimizer on q2 was this:
[Original query plan without applying the QueryJoinOptimizer]
Projection
ProjectionElemList
ProjectionElem "name"
Join
Join
StatementPattern
Var (name=-const-1, value=person://1, anonymous)
Var (name=-const-2,
value=http://xmlns.com/foaf/0.1/knows, anonymous)
Var (name=-const-2-0, anonymous)
Union
ZeroLengthPath
Var (name=-const-2-0, anonymous)
Var (name=-const-3-1, anonymous)
StatementPattern
Var (name=-const-2-0, anonymous)
Var (name=-const-3,
value=http://xmlns.com/foaf/0.1/knows, anonymous)
Var (name=-const-3-1, anonymous)
StatementPattern
Var (name=-const-3-1, anonymous)
Var (name=-const-4, value=http://xmlns.com/foaf/0.1/name, anonymous)
Var (name=name)
[Query plan after applying the QueryJoinOptimizer]
Projection
ProjectionElemList
ProjectionElem "name"
Join
StatementPattern
Var (name=-const-1, value=person://1, anonymous)
Var (name=-const-2,
value=http://xmlns.com/foaf/0.1/knows, anonymous)
Var (name=-const-2-0, anonymous)
Join
StatementPattern
Var (name=-const-3-1, anonymous)
Var (name=-const-4,
value=http://xmlns.com/foaf/0.1/name, anonymous)
Var (name=name)
Union
ZeroLengthPath
Var (name=-const-2-0, anonymous)
Var (name=-const-3-1, anonymous)
StatementPattern
Var (name=-const-2-0, anonymous)
Var (name=-const-3,
value=http://xmlns.com/foaf/0.1/knows, anonymous)
Var (name=-const-3-1, anonymous)
As you can see, using the first query model we'll evaluate <person://1
http://xmlns.com/foaf/0.1/knows -const-2-0> and then <-const-2-0
http://xmlns.com/foaf/0.1/knows -const-3-1> with the already bound -const-2-0
and finally we'll evaluate the last statement using the binding for -const-3-1.
This is the correct ordering. What we can see from the second query model is
evaluating <person://1 http://xmlns.com/foaf/0.1/knows -const-2-0> and then the
last statement <-const-3-1 http://xmlns.com/foaf/0.1/name name> and then the
optional second statement, but as far as the the first two patterns in this
order don't share any variables a Cartesian product will be formed (the first
pattern has 999 results and the second one has 23334 results, hence 23,310,666
iterations, very few of which will succeed).
Unfortunately, there is no way to turn this optimizer off and even
there was one, other queries would become much slower (namely the
ones with sub-selects).
There can be introduced a parameter that switches the optimizer off as a
workaround, but you may encounter problems when using queries with sub-selects,
so in that case you should arrange the sub-selects manually and they should be
the first thing to a appear in a query (in case you use such queries along the
ones with problematic property path evaluation). Another workaround could be
using an ASK query to switch that optimizer on/off at runtime, but this will be
a rather clumsy approach (queries can be evaluated from multiple threads
asynchronously, so you won't have guarantee when exactly you use the optimizer
and when not). The real solution is a fix in Sesame to be provided (we'll
communicate the issue with the Sesame guys).
So the fastest solution will be to provide a parameter which will
statically forbid the optimizer (i.e. at initialization time).
Will that be ok in your case?
Hth,
Ruslan
On 06/18/2012 02:55 PM, Krzysztof Sielski wrote:
Hello,
>
>We noticed that after migrating to Owlim SE 5.1.5183 our queries
which use property paths with optional elements are evaluated
very slowly (in contrast to previous releases). Their direct
equivalents using UNION would return the same results much
faster. This is an example:
>
>[query for particular person's friends' names and their friends'
names]
>(q1)
>PREFIX foaf:<http://xmlns.com/foaf/0.1/>
>select * WHERE {
> {<person://1> foaf:knows/foaf:name ?name}
> UNION
> {<person://1> foaf:knows/foaf:knows/foaf:name ?name}
>}
>
>(q2)
>PREFIX foaf:<http://xmlns.com/foaf/0.1/>
>select * WHERE {
><person://1> foaf:knows/foaf:knows?/foaf:name ?name
>}
>
>Both queries seem to be equivalent and we used (q2) as it is
more concise and elegant but now (q1) is much faster:
>Executing query q1
>Result count: 2 in 0,006000s.
>Executing query q2
>Result count: 2 in 7,034000s.
>
>As before, I attached a simple class that creates a local
repository, inserts some data and executes the queries to show
you the problem.
>
>
>_______________________________________________
Owlim-discussion mailing list Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion