Andy - Yes we are talking about the same thing, I understand the scoping of FILTER in OPTIONAL and when it applies over the join rather than over the inner operator
Mike - Thanks for confirming my suspicion, turns out to be a trivial bug in the handling of left joins in dotNetRDF only when there is a cross product. The normal join case was already correctly handling this part of the spec and I just somehow missed it in the cross product case. Cheers, Rob On 27/11/2013 15:39, "Andy Seaborne" <[email protected]> wrote: >Hi Rob, > >Partial answer - I'm about to go into a RDF-WG telecon but I'll work >through the details later. I just wanted to check we are talking about >the same because "OPTIONAL{ ... FILTER ... }" is special. > >You'll see in the algebra there is no (filter) in this part. > > > (leftjoin > > (graph <http://a> > > (bgp (triple ?s ?p ?o))) > > (graph <http://b> > > (bgp (triple ?s0 ?p0 ?o0))) > > (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0)) (sameTerm ?o ?o0))))) > >The (&&...) is the 3rd argument to the leftJoin operation and forms part >of the join condition, not a filter over the GRAPH <http://b> { ?s0 ?p0 >?o0 . } nor applied after the LeftJoin - in SQL terms, is't the ON >condition for a leftjoin. Scope-wide the the (&&) can see the ?s which >it could not otherwise. > >For example this is a different query: > >SELECT * >WHERE >{ > GRAPH <http://a> > { > ?s ?p ?o . > } > OPTIONAL > { > { > GRAPH <http://b> { ?s0 ?p0 ?o0 . } > FILTER (SAMETERM(?s, ?s0) && SAMETERM(?p, ?p0) && SAMETERM(?o, >?o0)) > } > } > FILTER(!BOUND(?s0)) >} > >there is an additional {} inside the OPTIONAL {}. > >(filter (! (bound ?s0)) > (leftjoin > (graph <http://a> > (bgp (triple ?s ?p ?o))) > (filter (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0)) > (sameTerm ?o ?o0)) > (graph <http://b> > (bgp (triple ?s0 ?p0 ?o0)))))) > >Now has 2* (filter) > >ARQ then gives 2 rows > >---------------------------------------------------------- >| s | p | o | s0 | p0 | o0 | >========================================================== >| <http://r2> | <http://r2> | <http://r2> | | | | >| <http://r1> | <http://r1> | <http://r1> | | | | >---------------------------------------------------------- > >The &&-filter is always false (?s not defined) > >Is that what dotNetRDF returns? > >I get this with the normal and ref query engines in ARQ. > > Andy > ><http://r1> <http://r1> <http://r1> <http://a> . ><http://r2> <http://r2> <http://r2> <http://a> . ><http://r1> <http://r1> <http://r1> <http://b> . ><http://r2> <http://r2> <http://r2> <http://b> . > > >On 27/11/13 11:07, Rob Vesse wrote: >> Hey Andy >> >> Prompted by a bug originally reported for dotNetRDF (CORE-386 [1]) >>which I >> initially rejected as Invalid based on my understanding of how LeftJoin >> behaves I then reopened because the user reporting it gets different >> behaviour in ARQ (which I have reproduced) so I am unclear which of >> dotNetRDF or ARQ is doing things wrong based on my understanding of the >> specification. >> >> The test data is the trivial Turtle document as follows: >> >> <http://r1> <http://r1> <http://r1> . >> <http://r2> <http://r2> <http://r2> . >> >> And the query is as follows: >> >> SELECT * >> WHERE >> { >> GRAPH <http://a> >> { >> ?s ?p ?o . >> } >> OPTIONAL >> { >> GRAPH <http://b> { ?s0 ?p0 ?o0 . } >> FILTER (SAMETERM(?s, ?s0) && SAMETERM(?p, ?p0) && SAMETERM(?o, >>?o0)) >> } >> FILTER(!BOUND(?s0)) >> } >> >> >> And for reference the unoptimised algebra is as follows: >> >> (base <http://example/base/> >> (filter (! (bound ?s0)) >> (leftjoin >> (graph <http://a> >> (bgp (triple ?s ?p ?o))) >> (graph <http://b> >> (bgp (triple ?s0 ?p0 ?o0))) >> (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0)) (sameTerm ?o ?o0))))) >> >> >> The intent of the query is to calculate the delta of the graphs I.e. the >> triples that are present in <http://a> that are not present in >><http://b>. >> So given two identical graphs it was intended to return 0 results, >> however the behaviour in dotNetRDF is that it returns 2 results whereas >> ARQ returns 0 results. >> >> My belief was that dotNetRDF is correct and I'll explain why, I think I >> may be wrong and if so I'd love to understand why. My understanding of >> the flow of execution is as follows: >> >> Step 1 - Execute the LHS of the left join which finds all triples in >>graph >> <http://a> and thus returns the following: >> >> s = r1, p = r1, o = r1 >> s = r2, p = r2, o = r2 >> >> Step 2 - Execute the RHS of the left join which finds all triples in >>graph >> <http://b> and thus returns the following: >> >> s0 = r1, p0 = r1, o0 = r1 >> s0 = r2, p0 = r2, o0 = r2 >> >> >> Step 3 - Calculate the possible join >> >> s = r1, p = r1, o = r1, s0 = r1, p0 = r1, o0 = r1 >> s = r1, p = r1, o = r1, s0 = r2, p0 = r2, o0 = r2 >> s = r2, p = r2, o = r2, s0 = r1, p0 = r1, o0 = r1 >> s = r2, p = r2, o = r2, s0 = r2, p0 = r2, o0 = r2 >> >> >> Step 4 - Apply the filter on the left join >> >> >> s = r1, p = r1, o = r1 >> s = r1, p = r1, o = r1, s0 = r2, p0 = r2, o0 = r2 >> s = r2, p = r2, o = r2 >> s = r2, p = r2, o = r2, s0 = r2, p0 = r2, o0 = r2 >> >> This I think is where ARQ and dotNetRDF differ in behaviour and where I >> suspect my implementation is wrong. For the rows where FILTER fails for >> some (but not all rows) I retain the LHS whereas ARQ does not. I'm >> guessing that I'm missing some bit of the SPARQL specification for >> LeftJoin that says that where there is at least one valid joinable >> solution for a LHS solution then the LHS does not need to be preserved >>on >> its own? >> >> >> If you could point me to this I would much appreciate this. >> >> Step 5 - Apply the outer filter >> >> >> s0 = r1, p0 = r1, o0 = r1 >> s0 = r2, p0 = r2, o0 = r2 >> >> >> So dotNetRDF returns 2 results but ARQ returns 0 results for this query. >> Am I correct in thinking I've got a bug in my LeftJoin implementation >>over >> in dotNetRDF? Or is this actually a subtle bug in ARQ? >> >> Thanks, >> >> Rob >> >> p.s. code for this test case and variations on it is committed as >> TestGraphDeltas >> >> [1] http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=386 >> >> >> >> >
