Re: Order of triple patterns in Where Clause

Marco Neumann Fri, 06 Mar 2020 09:41:46 -0800

is there statistical data available for the number of deductions /
joins performed for each SPARQL query of a QueryExecution object?


On Fri, Mar 6, 2020 at 3:16 PM Andy Seaborne <[email protected]> wrote:

>
>
> On 05/03/2020 08:32, Kashif Rabbani wrote:
> > Hi Andy,
> >
> > Thanks for your response. I was wondering if there is any detailed
> documentation of the Jena optimization (rewriting & reordering) available
> online? If yes, can you please send me the reference?.
>
> The code mainly.
>
> The TDB stats is documented.
>
> > Also, if I create my own query plan (in algebraic form), is it possible
> to make Jena execute it as it is? I mean how to turn off jena’s
> optimization (rewriting & reordering)  and force my query plan for
> execution.
>
> Yes - two parts - algebra rewrites and BGP reordering.
>
> The context is a mapping of settings.
> there is a global context (ARQ.getContext())
> one per the DatasetGraph.getContext()
> one per query execution. QueryExecution.getContext()
>
> and it is treated hierarchically:
>
> Lookup in QueryExecution then DatasetGraph the Global.
>
> :: Algebra rewrite
>
> Some algebra rewrites have to be done - property functions, and rewrite
> some variables due to scoping. These aren't really "optimizations steps"
> but happen in that phase. There is OptimizerMinimal for those.
>
> To turn off optimizer and still do the minimum steps.
>
> context.set(ARQ.optimization, false)
>
> Either Algebra.exec(op, dsg) executes the algebra as given - that's a
> very low levelway of doing it.
>
> Turning the optimizer off is better because all the APIs work. eg
> QueryExecution.
>
> :: BGP reordering
>
> The reordering of triple patterns is separate.
> BGP steps are performed by a StageGenerator.
>
> To set up to use a custom StageGenerator:
>
> StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ;
>
> That's really only  call of
>     context.set(ARQ.stageGenerator, myStageGenerator) ;
>
> The default is StageGenratorGeneric that does ReorderFixed.
> It is used if there is no other setting in the context.
>
>      Andy
>
> >
> > Thanks again for your help.
> >
> > Regards,
> >
> > Kashif Rabbani,
> > Research Assistant,
> > Department of Computer Science,
> > Aalborg University, Denmark.
> >
> >> On 3 Mar 2020, at 13.43, Andy Seaborne <[email protected]> wrote:
> >>
> >> Hi Kashif,
> >>
> >> Optimization happens in two stages:
> >>
> >> 1. Rewrite of the algebra
> >> 2. Reordering of the BGPs
> >>
> >> BGPs can be implemented differnet ways - and they are an inferenece
> extnesion point in SPARQL.
> >>
> >> What you see if the first. BGPs are reordered during execution.
> >>
> >> The algorithm can be stats driven for TDB and TDB2 storage:
> >>   https://jena.apache.org/documentation/tdb/optimizer.html
> >>
> >> The interface is
> org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation
> >>
> >> and a general purpose reordering is done for in-memory and is the
> default for TDB.
> >>
> >> The default reorder is "grounded triples first, leave equal weights
> alone". It cascades whether a term is bound by an earlier step.
> >>
> >>>     { ?a  mbz:alias           "Amy Beach" .
> >>>       ?b  cmno:hasInfluenced  ?a .
> >>>       ?c  mo:composer         ?b ;
> >>>           bio:date            ?d
> >>>     }
> >>
> >> That's actually the default order -
> >>
> >> ?a  mbz:alias           "Amy Beach" .
> >>
> >> has two bound terms so is done first.
> >>
> >> and now ?a is bound so
> >> ?b  cmno:hasInfluenced  ?a .
> >>
> >> etc.
> >>
> >> Given the boundedness of the pattern, and (guess) mbz:alias "Amy Beach"
> is quite selective, With stats  ? <property> ? would have to be less
> numerous than ? mbz:alias "Amy Beach".
> >>
> >> There's no algebra optimization for your example, only BGP reordering.
> >>
> >> qparse --print=opt shows stage 1 optimizations.
> >>
> >> Executing with "explain" shows BGP execution.
> >>
> >>     Andy
> >>
> >>
> >>
> >> On 03/03/2020 11:56, Kashif Rabbani wrote:
> >>> Hi awesome community,
> >>> I have a question,  I am working on optimizing SPARQL query plan and I
> wonder does the order of triple patterns in the where clause effects the
> query plan or not?
> >>> For example, given a following query:
> >>> PREFIX  bio:  <http://purl.org/vocab/bio/0.1/>
> >>> PREFIX  mo:   <http://purl.org/ontology/mo/>
> >>> PREFIX  mbz:  <http://dbtune.org/musicbrainz/resource/vocab/>
> >>> PREFIX  cmno: <http://purl.org/ontology/classicalmusicnav#>
> >>> SELECT  ?a ?b ?c
> >>> WHERE
> >>>    { ?a  mbz:alias           "Amy Beach" .
> >>>      ?b  cmno:hasInfluenced  ?a .
> >>>      ?c  mo:composer         ?b ;
> >>>          bio:date            ?d
> >>>    }
> >>> // Let’s generate its algebra
> >>> Op op = Algebra.compile(query); results into this:
> >>> (project (?a ?b ?c)
> >>>    (bgp
> >>>      (triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias>
> "Amy Beach")
> >>>      (triple ?b <
> http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a)
> >>>      (triple ?c <http://purl.org/ontology/mo/composer> ?b)
> >>>      (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d)
> >>>    ))
> >>> The bgp in algebra follows the exact same order as specified in the
> where clause of the query. Very precisely, does Jena constructs the query
> plan as it is? or it will change the order at some other level?
> >>> I would be happy if someone can guide me about how the Jena's plan
> actually constructed. If I will use some statistics of the actual RDF graph
> to change the order of triple patterns in the BGP based on selectivity,
> would it optimize the plan somehow?
> >>> Many Thanks,
> >>> Best Regards,
> >>> Kashif Rabbani.
> >
>


-- 


---
Marco Neumann
KONA

Re: Order of triple patterns in Where Clause

Reply via email to