is there statistical data available for the number of deductions / joins performed for each SPARQL query of a QueryExecution object?
On Fri, Mar 6, 2020 at 3:16 PM Andy Seaborne <[email protected]> wrote: > > > On 05/03/2020 08:32, Kashif Rabbani wrote: > > Hi Andy, > > > > Thanks for your response. I was wondering if there is any detailed > documentation of the Jena optimization (rewriting & reordering) available > online? If yes, can you please send me the reference?. > > The code mainly. > > The TDB stats is documented. > > > Also, if I create my own query plan (in algebraic form), is it possible > to make Jena execute it as it is? I mean how to turn off jena’s > optimization (rewriting & reordering) and force my query plan for > execution. > > Yes - two parts - algebra rewrites and BGP reordering. > > The context is a mapping of settings. > there is a global context (ARQ.getContext()) > one per the DatasetGraph.getContext() > one per query execution. QueryExecution.getContext() > > and it is treated hierarchically: > > Lookup in QueryExecution then DatasetGraph the Global. > > :: Algebra rewrite > > Some algebra rewrites have to be done - property functions, and rewrite > some variables due to scoping. These aren't really "optimizations steps" > but happen in that phase. There is OptimizerMinimal for those. > > To turn off optimizer and still do the minimum steps. > > context.set(ARQ.optimization, false) > > Either Algebra.exec(op, dsg) executes the algebra as given - that's a > very low levelway of doing it. > > Turning the optimizer off is better because all the APIs work. eg > QueryExecution. > > :: BGP reordering > > The reordering of triple patterns is separate. > BGP steps are performed by a StageGenerator. > > To set up to use a custom StageGenerator: > > StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ; > > That's really only call of > context.set(ARQ.stageGenerator, myStageGenerator) ; > > The default is StageGenratorGeneric that does ReorderFixed. > It is used if there is no other setting in the context. > > Andy > > > > > Thanks again for your help. > > > > Regards, > > > > Kashif Rabbani, > > Research Assistant, > > Department of Computer Science, > > Aalborg University, Denmark. > > > >> On 3 Mar 2020, at 13.43, Andy Seaborne <[email protected]> wrote: > >> > >> Hi Kashif, > >> > >> Optimization happens in two stages: > >> > >> 1. Rewrite of the algebra > >> 2. Reordering of the BGPs > >> > >> BGPs can be implemented differnet ways - and they are an inferenece > extnesion point in SPARQL. > >> > >> What you see if the first. BGPs are reordered during execution. > >> > >> The algorithm can be stats driven for TDB and TDB2 storage: > >> https://jena.apache.org/documentation/tdb/optimizer.html > >> > >> The interface is > org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation > >> > >> and a general purpose reordering is done for in-memory and is the > default for TDB. > >> > >> The default reorder is "grounded triples first, leave equal weights > alone". It cascades whether a term is bound by an earlier step. > >> > >>> { ?a mbz:alias "Amy Beach" . > >>> ?b cmno:hasInfluenced ?a . > >>> ?c mo:composer ?b ; > >>> bio:date ?d > >>> } > >> > >> That's actually the default order - > >> > >> ?a mbz:alias "Amy Beach" . > >> > >> has two bound terms so is done first. > >> > >> and now ?a is bound so > >> ?b cmno:hasInfluenced ?a . > >> > >> etc. > >> > >> Given the boundedness of the pattern, and (guess) mbz:alias "Amy Beach" > is quite selective, With stats ? <property> ? would have to be less > numerous than ? mbz:alias "Amy Beach". > >> > >> There's no algebra optimization for your example, only BGP reordering. > >> > >> qparse --print=opt shows stage 1 optimizations. > >> > >> Executing with "explain" shows BGP execution. > >> > >> Andy > >> > >> > >> > >> On 03/03/2020 11:56, Kashif Rabbani wrote: > >>> Hi awesome community, > >>> I have a question, I am working on optimizing SPARQL query plan and I > wonder does the order of triple patterns in the where clause effects the > query plan or not? > >>> For example, given a following query: > >>> PREFIX bio: <http://purl.org/vocab/bio/0.1/> > >>> PREFIX mo: <http://purl.org/ontology/mo/> > >>> PREFIX mbz: <http://dbtune.org/musicbrainz/resource/vocab/> > >>> PREFIX cmno: <http://purl.org/ontology/classicalmusicnav#> > >>> SELECT ?a ?b ?c > >>> WHERE > >>> { ?a mbz:alias "Amy Beach" . > >>> ?b cmno:hasInfluenced ?a . > >>> ?c mo:composer ?b ; > >>> bio:date ?d > >>> } > >>> // Let’s generate its algebra > >>> Op op = Algebra.compile(query); results into this: > >>> (project (?a ?b ?c) > >>> (bgp > >>> (triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias> > "Amy Beach") > >>> (triple ?b < > http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a) > >>> (triple ?c <http://purl.org/ontology/mo/composer> ?b) > >>> (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d) > >>> )) > >>> The bgp in algebra follows the exact same order as specified in the > where clause of the query. Very precisely, does Jena constructs the query > plan as it is? or it will change the order at some other level? > >>> I would be happy if someone can guide me about how the Jena's plan > actually constructed. If I will use some statistics of the actual RDF graph > to change the order of triple patterns in the BGP based on selectivity, > would it optimize the plan somehow? > >>> Many Thanks, > >>> Best Regards, > >>> Kashif Rabbani. > > > -- --- Marco Neumann KONA
