The problem is that the subsitution is done after optimization but it still is substitution semantics.

Currently, execution proceeds as:

QueryEngineBase.createPlan:
  Op op = modifyOp(queryOp) ;
  eval(op, dsg, binding, context) ;

and modifyOp does the highlevel optimizations as selected by the engine.

At evaluation time, something like this happens in eval for each query engine:

   if ( ! input.isEmpty() )
      op = Substitute.substitute(op, input) ;

We can:

A/ Do early algebra substitution:

Advantage: gives the optimizer more chance to do a good jon, especially on FILTER (?x = ?param)
Disadvantage:
If Service.exec changes to use the original input syntax, it breaks. See note in that method.

The change is:

QueryEngineBase
protected Plan createPlan()
    {
        // Decide the algebra to actually execute.
        Op op = queryOp ;
*** New code ***
        if ( ! startBinding.isEmpty() ) {
            op = Substitute.substitute(op, startBinding) ;
            context.put(ARQConstants.sysCurrentAlgebra, op) ;
            // Don't reset the startBinding because it also is needed
            // in the output.
        }
*** New code ***
        op = modifyOp(op) ;

doing it in setOp() may be better.

This works - testInitialBindings5 and testInitialBindings6 then fail as expected.

Query engines then don't need to do this step - they still need to create bindings including the initial conditions (else CONSTRUCT does not work).

B/ Do abstract query syntax substitution.
   Do this at inside QueryExecutionBase.setInitialBinding.

To this end I have now committed code that does abstract query syntax substitution that I wrote as an experiment during the last discussions:

https://svn.apache.org/repos/asf/jena/Scratch/AFS/Jena-Dev/trunk/src/element/

This is not advocacy of this approach - it's giving us choices. It also sort-of works for updates. (Biggest hole is in testing - updates don't support structure .equals).

Substitution and Update:

We need something.  The most natural case IMO is

INSERT DATA { ?param1 foaf:name ?param2 }

so rewriting the abstract syntax tree will not currently work - that's illegal syntax for an INSERT DATA. The parser could be modified and the "no vars" check done later.

The current parametrized SPARQL Strings can do this. A touch odd we have two separate mechanisms.

Multiple Values:

Whatever mechanism we end up with, the semantics of multiple values should be loop-substitute.


Rob's list:

1 – Remove support for initial bindings on queries entirely

AFS: -1

2 – Change initial bindings to be a pre-optimization algebra
transformation of the query

2a – Algebra -> Algebra done before optimization (A)
2b – AST -> AST done before creating the query engine (B)

AFS: +1  (A is tested, and seems to work).

Note the checked-in code for B does the case of

SELECT ?x {}
==>
SELECT (1 AS ?x) {}

which parametrized SPARQL strings and algebra->algebra does not.

3 – Change initial bindings to be done by injection of VALUES clauses

AFS: -0.5 - seems complicated and I'm not clear it will work (I'd need more time to think about it).

However this approach might get rather complex

Yes - complicated - certainly it isn't a simple add VALUES to the top level pattern but I'd have to think harder to know whether it interacts with scoping and name substitution.

Need to be very careful that it does not assume ARQ's current execution strategy which is scope sensitive.

4 – Skip optimization when initial bindings are involved

AFS: -1

This isn't per-se an optimization issue - the feature is there for taking information obtained elsewhere - it does work for a graph-wide query becoming quite local and a big advantage.

        Andy



Reply via email to