Re: Order of triple patterns in Where Clause

Marco Neumann Mon, 09 Mar 2020 06:11:53 -0700

Ok granted yes not functional programming correct, but I think we are
getting sidetracked here by the specific meaning of the terms "purely
functional" and "pure function" . I am referring here only to the feature
of counting reductions in haskell.


So let's take a basic query plus filter in ARQ on the following data set:

 :a :b :c
 :c :d :e
 :f :g :h

with this query:

(filter (! (sameTerm ?x ?c))
  (bgp
    (triple ?x ?y ?z)
    (triple ?c ?d ?e)
  ))

how many total "evaluations/operations" are performed over the data set to
arrive at a result set of 6?


On Mon, Mar 9, 2020 at 11:50 AM Andy Seaborne <[email protected]> wrote:

> I don't see how it applies the ARQ evaluator.
>
> That's not how it works.
>
> Just because the algebra is functional, it's not functional programming
> and not reduction evaluation.  It has executable statements and external
> data.
>
>      Andy
>
> On 08/03/2020 17:02, Marco Neumann wrote:
> > sorry my bad, that was a typo should be reductions* . A very basic
> concept
> > in functional languages like haskell and heap size measured in cells.
> >
> >
> > "Reduction is the process of converting an expression to a simpler form.
> > Conceptually, an expression is reduced by simplifying one reducible
> > expression (called “redex”) at a time."
> >
> https://www.futurelearn.com/courses/functional-programming-haskell/0/steps/27197
> >
> >
> > On Sun, Mar 8, 2020 at 4:44 PM Andy Seaborne <[email protected]> wrote:
> >
> >> Then I don't understand what you are looking for.
> >>
> >> What's a "deduction"? What's a "cell"?
> >>
> >> On 08/03/2020 14:22, Marco Neumann wrote:
> >>> thank you for the hint Andy, but not quite what I was looking for.
> >>>
> >>> I was aiming more for a type of feature I am familiar with from purely
> >>> functional programming languages like haskell, hugs, miranda etc to
> >> display
> >>> deductions and cells used during execution.
> >>>
> >>> Marco
> >>>
> >>> On Sun, Mar 8, 2020 at 10:42 AM Andy Seaborne <[email protected]> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 06/03/2020 17:40, Marco Neumann wrote:
> >>>>> is there statistical data available for the number of deductions /
> >>>>> joins performed for each SPARQL query of a QueryExecution object?
> >>>>
> >>>> If you run with "explain" you can find out but there isn't a specific
> >>>> record kept by the code.
> >>>>
> >>>>>
> >>>>> On Fri, Mar 6, 2020 at 3:16 PM Andy Seaborne <[email protected]>
> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 05/03/2020 08:32, Kashif Rabbani wrote:
> >>>>>>> Hi Andy,
> >>>>>>>
> >>>>>>> Thanks for your response. I was wondering if there is any detailed
> >>>>>> documentation of the Jena optimization (rewriting & reordering)
> >>>> available
> >>>>>> online? If yes, can you please send me the reference?.
> >>>>>>
> >>>>>> The code mainly.
> >>>>>>
> >>>>>> The TDB stats is documented.
> >>>>>>
> >>>>>>> Also, if I create my own query plan (in algebraic form), is it
> >> possible
> >>>>>> to make Jena execute it as it is? I mean how to turn off jena’s
> >>>>>> optimization (rewriting & reordering)  and force my query plan for
> >>>>>> execution.
> >>>>>>
> >>>>>> Yes - two parts - algebra rewrites and BGP reordering.
> >>>>>>
> >>>>>> The context is a mapping of settings.
> >>>>>> there is a global context (ARQ.getContext())
> >>>>>> one per the DatasetGraph.getContext()
> >>>>>> one per query execution. QueryExecution.getContext()
> >>>>>>
> >>>>>> and it is treated hierarchically:
> >>>>>>
> >>>>>> Lookup in QueryExecution then DatasetGraph the Global.
> >>>>>>
> >>>>>> :: Algebra rewrite
> >>>>>>
> >>>>>> Some algebra rewrites have to be done - property functions, and
> >> rewrite
> >>>>>> some variables due to scoping. These aren't really "optimizations
> >> steps"
> >>>>>> but happen in that phase. There is OptimizerMinimal for those.
> >>>>>>
> >>>>>> To turn off optimizer and still do the minimum steps.
> >>>>>>
> >>>>>> context.set(ARQ.optimization, false)
> >>>>>>
> >>>>>> Either Algebra.exec(op, dsg) executes the algebra as given - that's
> a
> >>>>>> very low levelway of doing it.
> >>>>>>
> >>>>>> Turning the optimizer off is better because all the APIs work. eg
> >>>>>> QueryExecution.
> >>>>>>
> >>>>>> :: BGP reordering
> >>>>>>
> >>>>>> The reordering of triple patterns is separate.
> >>>>>> BGP steps are performed by a StageGenerator.
> >>>>>>
> >>>>>> To set up to use a custom StageGenerator:
> >>>>>>
> >>>>>> StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ;
> >>>>>>
> >>>>>> That's really only  call of
> >>>>>>        context.set(ARQ.stageGenerator, myStageGenerator) ;
> >>>>>>
> >>>>>> The default is StageGenratorGeneric that does ReorderFixed.
> >>>>>> It is used if there is no other setting in the context.
> >>>>>>
> >>>>>>         Andy
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks again for your help.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Kashif Rabbani,
> >>>>>>> Research Assistant,
> >>>>>>> Department of Computer Science,
> >>>>>>> Aalborg University, Denmark.
> >>>>>>>
> >>>>>>>> On 3 Mar 2020, at 13.43, Andy Seaborne <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Hi Kashif,
> >>>>>>>>
> >>>>>>>> Optimization happens in two stages:
> >>>>>>>>
> >>>>>>>> 1. Rewrite of the algebra
> >>>>>>>> 2. Reordering of the BGPs
> >>>>>>>>
> >>>>>>>> BGPs can be implemented differnet ways - and they are an
> inferenece
> >>>>>> extnesion point in SPARQL.
> >>>>>>>>
> >>>>>>>> What you see if the first. BGPs are reordered during execution.
> >>>>>>>>
> >>>>>>>> The algorithm can be stats driven for TDB and TDB2 storage:
> >>>>>>>>      https://jena.apache.org/documentation/tdb/optimizer.html
> >>>>>>>>
> >>>>>>>> The interface is
> >>>>>>
> org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation
> >>>>>>>>
> >>>>>>>> and a general purpose reordering is done for in-memory and is the
> >>>>>> default for TDB.
> >>>>>>>>
> >>>>>>>> The default reorder is "grounded triples first, leave equal
> weights
> >>>>>> alone". It cascades whether a term is bound by an earlier step.
> >>>>>>>>
> >>>>>>>>>        { ?a  mbz:alias           "Amy Beach" .
> >>>>>>>>>          ?b  cmno:hasInfluenced  ?a .
> >>>>>>>>>          ?c  mo:composer         ?b ;
> >>>>>>>>>              bio:date            ?d
> >>>>>>>>>        }
> >>>>>>>>
> >>>>>>>> That's actually the default order -
> >>>>>>>>
> >>>>>>>> ?a  mbz:alias           "Amy Beach" .
> >>>>>>>>
> >>>>>>>> has two bound terms so is done first.
> >>>>>>>>
> >>>>>>>> and now ?a is bound so
> >>>>>>>> ?b  cmno:hasInfluenced  ?a .
> >>>>>>>>
> >>>>>>>> etc.
> >>>>>>>>
> >>>>>>>> Given the boundedness of the pattern, and (guess) mbz:alias "Amy
> >>>> Beach"
> >>>>>> is quite selective, With stats  ? <property> ? would have to be less
> >>>>>> numerous than ? mbz:alias "Amy Beach".
> >>>>>>>>
> >>>>>>>> There's no algebra optimization for your example, only BGP
> >> reordering.
> >>>>>>>>
> >>>>>>>> qparse --print=opt shows stage 1 optimizations.
> >>>>>>>>
> >>>>>>>> Executing with "explain" shows BGP execution.
> >>>>>>>>
> >>>>>>>>        Andy
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 03/03/2020 11:56, Kashif Rabbani wrote:
> >>>>>>>>> Hi awesome community,
> >>>>>>>>> I have a question,  I am working on optimizing SPARQL query plan
> >> and
> >>>> I
> >>>>>> wonder does the order of triple patterns in the where clause effects
> >> the
> >>>>>> query plan or not?
> >>>>>>>>> For example, given a following query:
> >>>>>>>>> PREFIX  bio:  <http://purl.org/vocab/bio/0.1/>
> >>>>>>>>> PREFIX  mo:   <http://purl.org/ontology/mo/>
> >>>>>>>>> PREFIX  mbz:  <http://dbtune.org/musicbrainz/resource/vocab/>
> >>>>>>>>> PREFIX  cmno: <http://purl.org/ontology/classicalmusicnav#>
> >>>>>>>>> SELECT  ?a ?b ?c
> >>>>>>>>> WHERE
> >>>>>>>>>       { ?a  mbz:alias           "Amy Beach" .
> >>>>>>>>>         ?b  cmno:hasInfluenced  ?a .
> >>>>>>>>>         ?c  mo:composer         ?b ;
> >>>>>>>>>             bio:date            ?d
> >>>>>>>>>       }
> >>>>>>>>> // Let’s generate its algebra
> >>>>>>>>> Op op = Algebra.compile(query); results into this:
> >>>>>>>>> (project (?a ?b ?c)
> >>>>>>>>>       (bgp
> >>>>>>>>>         (triple ?a <
> >> http://dbtune.org/musicbrainz/resource/vocab/alias
> >>>>>
> >>>>>> "Amy Beach")
> >>>>>>>>>         (triple ?b <
> >>>>>> http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a)
> >>>>>>>>>         (triple ?c <http://purl.org/ontology/mo/composer> ?b)
> >>>>>>>>>         (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d)
> >>>>>>>>>       ))
> >>>>>>>>> The bgp in algebra follows the exact same order as specified in
> the
> >>>>>> where clause of the query. Very precisely, does Jena constructs the
> >>>> query
> >>>>>> plan as it is? or it will change the order at some other level?
> >>>>>>>>> I would be happy if someone can guide me about how the Jena's
> plan
> >>>>>> actually constructed. If I will use some statistics of the actual
> RDF
> >>>> graph
> >>>>>> to change the order of triple patterns in the BGP based on
> >> selectivity,
> >>>>>> would it optimize the plan somehow?
> >>>>>>>>> Many Thanks,
> >>>>>>>>> Best Regards,
> >>>>>>>>> Kashif Rabbani.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann
KONA

Re: Order of triple patterns in Where Clause

Reply via email to