On 21/11/11 02:21, Tim Harsch wrote:
My colleague ran into an issue and sent me the following observation. I told
him I would relay it to the list.
##############
We have been seeing an inconsistency with the way multiple-expression FILTERs
applied to an OPTIONAL clause are handled. At first, SPARQL queries with the
following sort of structure,
…
WHERE {
…
?class3 rdfs:subClassOf foaf:Document .
?doc3 rdf:type ?class3 .
?doc3 dcterms:references ?bag3 .
?bag3 ?member3 ?doc
OPTIONAL {
?class4 rdfs:subClassOf foaf:Document .
?doc4 rdf:type ?class4 .
?doc4 dcterms:references ?bag4 .
?bag4 ?member4 ?doc3
FILTER (!bound(?doc4))
FILTER (!bound(?bag4) )
> }
> …
Observation: the usual idiom for negation in SPARQL 1.0 is to place the
FILTER/!bound outside and after the OPTIONAL.
OPTIONAL {
?class4 rdfs:subClassOf foaf:Document .
?doc4 rdf:type ?class4 .
?doc4 dcterms:references ?bag4 .
?bag4 ?member4 ?doc3
}
FILTER (!bound(?doc4))
FILTER (!bound(?bag4) )
Inside, FILTER/!bound do not filter whether the optional happened or
not. ?doc4 and ?bag4 are bound by the OPTIONAL { BGP } and so are bound
for the LeftJoin condition.
(complete queries, with namespaces appreciated so they can be
cut-and-pasted in tools easily)
used to be translated, by SPARQLer Query Validator, into SPARQL Algebra that
looked like the following:
…
(leftjoin
(quadpattern
(quad<urn:x-arq:DefaultGraphNode> ?class3 rdfs:subClassOf
foaf:Document)
(quad<urn:x-arq:DefaultGraphNode> ?doc3 rdf:type ?class3)
(quad<urn:x-arq:DefaultGraphNode> ?doc3 dcterms:references ?bag3)
(quad<urn:x-arq:DefaultGraphNode> ?bag3 ?member3 ?doc)
)
(quadpattern
(quad<urn:x-arq:DefaultGraphNode> ?class4 rdfs:subClassOf
foaf:Document)
(quad<urn:x-arq:DefaultGraphNode> ?doc4 rdf:type ?class4)
(quad<urn:x-arq:DefaultGraphNode> ?doc4 dcterms:references ?bag4)
(quad<urn:x-arq:DefaultGraphNode> ?bag4 ?member4 ?doc3)
)
(exprlist (! (bound ?doc4)) (! (bound ?bag4))))))))
In the last couple of weeks, however, the “exprlist” operator never appeared,
and instead we’d see a single, AND-ed expression:
…
(leftjoin
(quadpattern
(quad<urn:x-arq:DefaultGraphNode> ?class3 rdfs:subClassOf
foaf:Document)
(quad<urn:x-arq:DefaultGraphNode> ?doc3 rdf:type ?class3)
(quad<urn:x-arq:DefaultGraphNode> ?doc3 dcterms:references ?bag3)
(quad<urn:x-arq:DefaultGraphNode> ?bag3 ?member3 ?doc)
)
(quadpattern
(quad<urn:x-arq:DefaultGraphNode> ?class4 rdfs:subClassOf
foaf:Document)
(quad<urn:x-arq:DefaultGraphNode> ?doc4 rdf:type ?class4)
(quad<urn:x-arq:DefaultGraphNode> ?doc4 dcterms:references ?bag4)
(quad<urn:x-arq:DefaultGraphNode> ?bag4 ?member4 ?doc3)
)
(&& (! (bound ?doc4)) (! (bound ?bag4))))))))
This happens if you write:
FILTER (!bound(?doc4) && FILTER (!bound(?bag4) )
That has the same effect but it is different query.
This is OK, since it represents how we have to treat the expressions
anyway, but it just worries us that we’re shooting at a moving target.
Note that recently, the “exprlist” construct has reappeared. Either
changes are being made to the SPARQL-to-SPARQL-Algebra translation, or
we’re missing some fine point of the SPARQL grammar. Either way, we need
to know what’s going on.
##############
The exprlist form looks right. I don't recall any changes being made in
this area, specifically, I don't recall any code that aggregates
exprlists into && expressions.
Which version and which tools are you running?
There is code to break up && into exprlists as an optimizer step - it
enables the individual filter expressions to be placed more accurately
later on.
The algebra output by the algebra generator (before the optimizer runs)
should stable; algebra after optimization may change.
Different storage layers use different sets of optimization steps : SDB
and TDB do different things even at the high level algebra rewrites
because SDB tries to leave as much to the SQL optimizer as possible on
the theory that it knows best (this is only partially true!)
Your example has been fed through the algebra to quad form transform
after algebra generation - is that the only transform being applied?
Andy
Thanks,
Tim