On 05/03/2020 08:32, Kashif Rabbani wrote:
Hi Andy,

Thanks for your response. I was wondering if there is any detailed documentation of 
the Jena optimization (rewriting & reordering) available online? If yes, can 
you please send me the reference?.

The code mainly.

The TDB stats is documented.

Also, if I create my own query plan (in algebraic form), is it possible to make 
Jena execute it as it is? I mean how to turn off jena’s optimization (rewriting 
& reordering)  and force my query plan for execution.

Yes - two parts - algebra rewrites and BGP reordering.

The context is a mapping of settings.
there is a global context (ARQ.getContext())
one per the DatasetGraph.getContext()
one per query execution. QueryExecution.getContext()

and it is treated hierarchically:

Lookup in QueryExecution then DatasetGraph the Global.

:: Algebra rewrite

Some algebra rewrites have to be done - property functions, and rewrite some variables due to scoping. These aren't really "optimizations steps" but happen in that phase. There is OptimizerMinimal for those.

To turn off optimizer and still do the minimum steps.

context.set(ARQ.optimization, false)

Either Algebra.exec(op, dsg) executes the algebra as given - that's a very low levelway of doing it.

Turning the optimizer off is better because all the APIs work. eg QueryExecution.

:: BGP reordering

The reordering of triple patterns is separate.
BGP steps are performed by a StageGenerator.

To set up to use a custom StageGenerator:

StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ;

That's really only  call of
   context.set(ARQ.stageGenerator, myStageGenerator) ;

The default is StageGenratorGeneric that does ReorderFixed.
It is used if there is no other setting in the context.

    Andy


Thanks again for your help.

Regards,

Kashif Rabbani,
Research Assistant,
Department of Computer Science,
Aalborg University, Denmark.

On 3 Mar 2020, at 13.43, Andy Seaborne <a...@apache.org> wrote:

Hi Kashif,

Optimization happens in two stages:

1. Rewrite of the algebra
2. Reordering of the BGPs

BGPs can be implemented differnet ways - and they are an inferenece extnesion 
point in SPARQL.

What you see if the first. BGPs are reordered during execution.

The algorithm can be stats driven for TDB and TDB2 storage:
  https://jena.apache.org/documentation/tdb/optimizer.html

The interface is 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation

and a general purpose reordering is done for in-memory and is the default for 
TDB.

The default reorder is "grounded triples first, leave equal weights alone". It 
cascades whether a term is bound by an earlier step.

    { ?a  mbz:alias           "Amy Beach" .
      ?b  cmno:hasInfluenced  ?a .
      ?c  mo:composer         ?b ;
          bio:date            ?d
    }

That's actually the default order -

?a  mbz:alias           "Amy Beach" .

has two bound terms so is done first.

and now ?a is bound so
?b  cmno:hasInfluenced  ?a .

etc.

Given the boundedness of the pattern, and (guess) mbz:alias "Amy Beach" is quite selective, 
With stats  ? <property> ? would have to be less numerous than ? mbz:alias "Amy Beach".

There's no algebra optimization for your example, only BGP reordering.

qparse --print=opt shows stage 1 optimizations.

Executing with "explain" shows BGP execution.

    Andy



On 03/03/2020 11:56, Kashif Rabbani wrote:
Hi awesome community,
I have a question,  I am working on optimizing SPARQL query plan and I wonder 
does the order of triple patterns in the where clause effects the query plan or 
not?
For example, given a following query:
PREFIX  bio:  <http://purl.org/vocab/bio/0.1/>
PREFIX  mo:   <http://purl.org/ontology/mo/>
PREFIX  mbz:  <http://dbtune.org/musicbrainz/resource/vocab/>
PREFIX  cmno: <http://purl.org/ontology/classicalmusicnav#>
SELECT  ?a ?b ?c
WHERE
   { ?a  mbz:alias           "Amy Beach" .
     ?b  cmno:hasInfluenced  ?a .
     ?c  mo:composer         ?b ;
         bio:date            ?d
   }
// Let’s generate its algebra
Op op = Algebra.compile(query); results into this:
(project (?a ?b ?c)
   (bgp
     (triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias> "Amy 
Beach")
     (triple ?b <http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a)
     (triple ?c <http://purl.org/ontology/mo/composer> ?b)
     (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d)
   ))
The bgp in algebra follows the exact same order as specified in the where 
clause of the query. Very precisely, does Jena constructs the query plan as it 
is? or it will change the order at some other level?
I would be happy if someone can guide me about how the Jena's plan actually 
constructed. If I will use some statistics of the actual RDF graph to change 
the order of triple patterns in the BGP based on selectivity, would it optimize 
the plan somehow?
Many Thanks,
Best Regards,
Kashif Rabbani.

Reply via email to