Hi Kashif,
Optimization happens in two stages:
1. Rewrite of the algebra
2. Reordering of the BGPs
BGPs can be implemented differnet ways - and they are an inferenece
extnesion point in SPARQL.
What you see if the first. BGPs are reordered during execution.
The algorithm can be stats driven for TDB and TDB2 storage:
https://jena.apache.org/documentation/tdb/optimizer.html
The interface is
org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation
and a general purpose reordering is done for in-memory and is the
default for TDB.
The default reorder is "grounded triples first, leave equal weights
alone". It cascades whether a term is bound by an earlier step.
> { ?a mbz:alias "Amy Beach" .
> ?b cmno:hasInfluenced ?a .
> ?c mo:composer ?b ;
> bio:date ?d
> }
That's actually the default order -
?a mbz:alias "Amy Beach" .
has two bound terms so is done first.
and now ?a is bound so
?b cmno:hasInfluenced ?a .
etc.
Given the boundedness of the pattern, and (guess) mbz:alias "Amy Beach"
is quite selective, With stats ? <property> ? would have to be less
numerous than ? mbz:alias "Amy Beach".
There's no algebra optimization for your example, only BGP reordering.
qparse --print=opt shows stage 1 optimizations.
Executing with "explain" shows BGP execution.
Andy
On 03/03/2020 11:56, Kashif Rabbani wrote:
Hi awesome community,
I have a question, I am working on optimizing SPARQL query plan and I wonder
does the order of triple patterns in the where clause effects the query plan or
not?
For example, given a following query:
PREFIX bio: <http://purl.org/vocab/bio/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX mbz: <http://dbtune.org/musicbrainz/resource/vocab/>
PREFIX cmno: <http://purl.org/ontology/classicalmusicnav#>
SELECT ?a ?b ?c
WHERE
{ ?a mbz:alias "Amy Beach" .
?b cmno:hasInfluenced ?a .
?c mo:composer ?b ;
bio:date ?d
}
// Let’s generate its algebra
Op op = Algebra.compile(query); results into this:
(project (?a ?b ?c)
(bgp
(triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias> "Amy
Beach")
(triple ?b <http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a)
(triple ?c <http://purl.org/ontology/mo/composer> ?b)
(triple ?c <http://purl.org/vocab/bio/0.1/date> ?d)
))
The bgp in algebra follows the exact same order as specified in the where
clause of the query. Very precisely, does Jena constructs the query plan as it
is? or it will change the order at some other level?
I would be happy if someone can guide me about how the Jena's plan actually
constructed. If I will use some statistics of the actual RDF graph to change
the order of triple patterns in the BGP based on selectivity, would it optimize
the plan somehow?
Many Thanks,
Best Regards,
Kashif Rabbani.