Re: Possible bug in LeftJoin implementation?

Mike Grove Wed, 27 Nov 2013 07:30:45 -0800

On Wed, Nov 27, 2013 at 6:07 AM, Rob Vesse <[email protected]> wrote:


> Hey Andy
>
> Prompted by a bug originally reported for dotNetRDF (CORE-386 [1]) which I
> initially rejected as Invalid based on my understanding of how LeftJoin
> behaves I then reopened because the user reporting it gets different
> behaviour in ARQ (which I have reproduced) so I am unclear which of
> dotNetRDF or ARQ is doing things wrong based on my understanding of the
> specification.
>
> The test data is the trivial Turtle document as follows:
>
> <http://r1> <http://r1> <http://r1> .
> <http://r2> <http://r2> <http://r2> .
>
> And the query is as follows:
>
> SELECT *
> WHERE
> {
>   GRAPH <http://a>
>   {
>     ?s ?p ?o .
>   }
>   OPTIONAL
>   {
>     GRAPH <http://b> { ?s0 ?p0 ?o0 . }
>     FILTER (SAMETERM(?s, ?s0) && SAMETERM(?p, ?p0) && SAMETERM(?o, ?o0))
>   }
>   FILTER(!BOUND(?s0))
> }
>
>
> And for reference the unoptimised algebra is as follows:
>
> (base <http://example/base/>
>  (filter (! (bound ?s0))
>   (leftjoin
>    (graph <http://a>
>     (bgp (triple ?s ?p ?o)))
>    (graph <http://b>
>     (bgp (triple ?s0 ?p0 ?o0)))
>    (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0)) (sameTerm ?o ?o0)))))
>
>
> The intent of the query is to calculate the delta of the graphs I.e. the
> triples that are present in <http://a> that are not present in <http://b>.
>  So given two identical graphs it was intended to return 0 results,
> however the behaviour in dotNetRDF is that it returns 2 results whereas
> ARQ returns 0 results.
>
> My belief was that dotNetRDF is correct and I'll explain why, I think I
> may be wrong and if so I'd love to understand why.  My understanding of
> the flow of execution is as follows:
>
> Step 1 - Execute the LHS of the left join which finds all triples in graph
> <http://a> and thus returns the following:
>
> s = r1, p = r1, o = r1
> s = r2, p = r2, o = r2
>
> Step 2 - Execute the RHS of the left join which finds all triples in graph
> <http://b> and thus returns the following:
>
> s0 = r1, p0 = r1, o0 = r1
> s0 = r2, p0 = r2, o0 = r2
>
>
> Step 3 - Calculate the possible join
>
> s = r1, p = r1, o = r1, s0 = r1, p0 = r1, o0 = r1
> s = r1, p = r1, o = r1, s0 = r2, p0 = r2, o0 = r2
> s = r2, p = r2, o = r2, s0 = r1, p0 = r1, o0 = r1
> s = r2, p = r2, o = r2, s0 = r2, p0 = r2, o0 = r2
>
>
> Step 4 - Apply the filter on the left join
>
>
> s = r1, p = r1, o = r1
> s = r1, p = r1, o = r1, s0 = r2, p0 = r2, o0 = r2
> s = r2, p = r2, o = r2
> s = r2, p = r2, o = r2, s0 = r2, p0 = r2, o0 = r2
>
> This I think is where ARQ and dotNetRDF differ in behaviour and where I
> suspect my implementation is wrong.  For the rows where FILTER fails for
> some (but not all rows) I retain the LHS whereas ARQ does not.  I'm
> guessing that I'm missing some bit of the SPARQL specification for
> LeftJoin that says that where there is at least one valid joinable
> solution for a LHS solution then the LHS does not need to be preserved on
> its own?
>
>
Yes, it's this.  If any at point a solution on the LHS joins with something
on the RHS, you don't need to preserve the unjoined solution from the LHS.
 That's only for when it completely fails to join.

So the output from that should should be the 2nd & 4th row from Step 4,
neither of which pass the subsequent filter, yielding zero results.

Cheers,

Mike


>
> If you could point me to this I would much appreciate this.
>
> Step 5 - Apply the outer filter
>
>
> s0 = r1, p0 = r1, o0 = r1
> s0 = r2, p0 = r2, o0 = r2
>
>
> So dotNetRDF returns 2 results but ARQ returns 0 results for this query.
> Am I correct in thinking I've got a bug in my LeftJoin implementation over
> in dotNetRDF?  Or is this actually a subtle bug in ARQ?
>
> Thanks,
>
> Rob
>
> p.s. code for this test case and variations on it is committed as
> TestGraphDeltas
>
> [1] http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=386
>
>
>
>
>

Re: Possible bug in LeftJoin implementation?

Reply via email to