Hi All
Holger's question
(http://mail-archives.apache.org/mod_mbox/jena-users/201308.mbox/%[email protected]%3e)
about a regression in ARQs treatment of initial bindings raises an interesting
disconnect between the interpretation of SPARQL and the Initial Bindings API.
Initial bindings in their current form allows for users to essentially change
the semantics of a query in a non-intuitive way. Take his example query:
ASK { FILTER(?a = ?b) }
Intuitively that query MUST always return false yet with initial bindings in
the mix the query can be made to return true, at least prior to 2.10.2 which
introduces a new optimizer which includes special case recognition for this.
The problem is that using initial bindings can fundamentally change the
semantics of queries in non-intuitive ways when I believe the intention of the
API was merely to allow for improved performance by guiding the engine.
To me this suggests that initial bindings as currently implemented is
fundamentally flawed and I would suggest that we think about re-architecting
this feature in a future release (not the next release). I believe there are
probably several ways of doing this:
1 – Remove support for initial bindings on queries entirely (as we already did
for updates) in favor of using ParameterizedSparqlString
2 – Change initial bindings to be a pre-optimization algebra transformation of
the query
As we've discussed previously in the context of ParameterizedSparqlString there
is potential to do the substitution at the algebra tree level rather than at
the textual level. This allows for stronger syntax checking and actually
changes the query appropriately. The problem with this is that it doesn't work
if we want to inject multiple values for a variable, hence Option 3
3 – Change initial bindings to be done by injection of VALUES clauses
This approach is again by algebra transform and would involve inserting VALUES
clauses at each leaf of the algebra tree. So Holger's query with initial
bindings applied would be rewritten like so:
ASK
{
VALUES ( ?a ?b ) { ( true true ) }
FILTER (?a = b)
}
However this approach might get rather complex for larger queries and also runs
into issues of scope, what if we insert the VALUES clause inside of a sub-query
which doesn't propagate those initial bindings outside of it etc.
4 – Skip optimization when initial bindings are involved
This is the easiest approach but we can't enforce this on other query engine
implementations and it could seriously harm performance for those that use
initial bindings extensively.
There may also be other approaches I haven't thought so please suggest anything
that makes sense. Bottom line is that initial bindings in its current form
seems fundamentally broken to me and we should be thinking of how to fix this
in the future.
Rob