Hi Andy, Thanks for your response. I was wondering if there is any detailed documentation of the Jena optimization (rewriting & reordering) available online? If yes, can you please send me the reference?.
Also, if I create my own query plan (in algebraic form), is it possible to make Jena execute it as it is? I mean how to turn off jena’s optimization (rewriting & reordering) and force my query plan for execution. Thanks again for your help. Regards, Kashif Rabbani, Research Assistant, Department of Computer Science, Aalborg University, Denmark. > On 3 Mar 2020, at 13.43, Andy Seaborne <a...@apache.org> wrote: > > Hi Kashif, > > Optimization happens in two stages: > > 1. Rewrite of the algebra > 2. Reordering of the BGPs > > BGPs can be implemented differnet ways - and they are an inferenece extnesion > point in SPARQL. > > What you see if the first. BGPs are reordered during execution. > > The algorithm can be stats driven for TDB and TDB2 storage: > https://jena.apache.org/documentation/tdb/optimizer.html > > The interface is > org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation > > and a general purpose reordering is done for in-memory and is the default for > TDB. > > The default reorder is "grounded triples first, leave equal weights alone". > It cascades whether a term is bound by an earlier step. > > > { ?a mbz:alias "Amy Beach" . > > ?b cmno:hasInfluenced ?a . > > ?c mo:composer ?b ; > > bio:date ?d > > } > > That's actually the default order - > > ?a mbz:alias "Amy Beach" . > > has two bound terms so is done first. > > and now ?a is bound so > ?b cmno:hasInfluenced ?a . > > etc. > > Given the boundedness of the pattern, and (guess) mbz:alias "Amy Beach" is > quite selective, With stats ? <property> ? would have to be less numerous > than ? mbz:alias "Amy Beach". > > There's no algebra optimization for your example, only BGP reordering. > > qparse --print=opt shows stage 1 optimizations. > > Executing with "explain" shows BGP execution. > > Andy > > > > On 03/03/2020 11:56, Kashif Rabbani wrote: >> Hi awesome community, >> I have a question, I am working on optimizing SPARQL query plan and I >> wonder does the order of triple patterns in the where clause effects the >> query plan or not? >> For example, given a following query: >> PREFIX bio: <http://purl.org/vocab/bio/0.1/> >> PREFIX mo: <http://purl.org/ontology/mo/> >> PREFIX mbz: <http://dbtune.org/musicbrainz/resource/vocab/> >> PREFIX cmno: <http://purl.org/ontology/classicalmusicnav#> >> SELECT ?a ?b ?c >> WHERE >> { ?a mbz:alias "Amy Beach" . >> ?b cmno:hasInfluenced ?a . >> ?c mo:composer ?b ; >> bio:date ?d >> } >> // Let’s generate its algebra >> Op op = Algebra.compile(query); results into this: >> (project (?a ?b ?c) >> (bgp >> (triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias> "Amy >> Beach") >> (triple ?b <http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a) >> (triple ?c <http://purl.org/ontology/mo/composer> ?b) >> (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d) >> )) >> The bgp in algebra follows the exact same order as specified in the where >> clause of the query. Very precisely, does Jena constructs the query plan as >> it is? or it will change the order at some other level? >> I would be happy if someone can guide me about how the Jena's plan actually >> constructed. If I will use some statistics of the actual RDF graph to change >> the order of triple patterns in the BGP based on selectivity, would it >> optimize the plan somehow? >> Many Thanks, >> Best Regards, >> Kashif Rabbani.