Hi Rob,

Partial answer - I'm about to go into a RDF-WG telecon but I'll work through the details later. I just wanted to check we are talking about the same because "OPTIONAL{ ... FILTER ... }" is special.

You'll see in the algebra there is no (filter) in this part.

>   (leftjoin
>     (graph <http://a>
>      (bgp (triple ?s ?p ?o)))
>     (graph <http://b>
>      (bgp (triple ?s0 ?p0 ?o0)))
>     (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0)) (sameTerm ?o ?o0)))))

The (&&...) is the 3rd argument to the leftJoin operation and forms part of the join condition, not a filter over the GRAPH <http://b> { ?s0 ?p0 ?o0 . } nor applied after the LeftJoin - in SQL terms, is't the ON condition for a leftjoin. Scope-wide the the (&&) can see the ?s which it could not otherwise.

For example this is a different query:

SELECT *
WHERE
{
  GRAPH <http://a>
  {
    ?s ?p ?o .
  }
  OPTIONAL
  {
    {
      GRAPH <http://b> { ?s0 ?p0 ?o0 . }
      FILTER (SAMETERM(?s, ?s0) && SAMETERM(?p, ?p0) && SAMETERM(?o, ?o0))
    }
  }
  FILTER(!BOUND(?s0))
}

there is an additional {} inside the OPTIONAL {}.

(filter (! (bound ?s0))
  (leftjoin
    (graph <http://a>
      (bgp (triple ?s ?p ?o)))
    (filter (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0))
                (sameTerm ?o ?o0))
      (graph <http://b>
        (bgp (triple ?s0 ?p0 ?o0))))))

Now has 2* (filter)

ARQ then gives 2 rows

----------------------------------------------------------
| s           | p           | o           | s0 | p0 | o0 |
==========================================================
| <http://r2> | <http://r2> | <http://r2> |    |    |    |
| <http://r1> | <http://r1> | <http://r1> |    |    |    |
----------------------------------------------------------

The &&-filter is always false (?s not defined)

Is that what dotNetRDF returns?

I get this with the normal and ref query engines in ARQ.

        Andy

<http://r1> <http://r1> <http://r1> <http://a> .
<http://r2> <http://r2> <http://r2> <http://a> .
<http://r1> <http://r1> <http://r1> <http://b> .
<http://r2> <http://r2> <http://r2> <http://b> .


On 27/11/13 11:07, Rob Vesse wrote:
Hey Andy

Prompted by a bug originally reported for dotNetRDF (CORE-386 [1]) which I
initially rejected as Invalid based on my understanding of how LeftJoin
behaves I then reopened because the user reporting it gets different
behaviour in ARQ (which I have reproduced) so I am unclear which of
dotNetRDF or ARQ is doing things wrong based on my understanding of the
specification.

The test data is the trivial Turtle document as follows:

<http://r1> <http://r1> <http://r1> .
<http://r2> <http://r2> <http://r2> .

And the query is as follows:

SELECT *
WHERE
{
   GRAPH <http://a>
   {
     ?s ?p ?o .
   }
   OPTIONAL
   {
     GRAPH <http://b> { ?s0 ?p0 ?o0 . }
     FILTER (SAMETERM(?s, ?s0) && SAMETERM(?p, ?p0) && SAMETERM(?o, ?o0))
   }
   FILTER(!BOUND(?s0))
}


And for reference the unoptimised algebra is as follows:

(base <http://example/base/>
  (filter (! (bound ?s0))
   (leftjoin
    (graph <http://a>
     (bgp (triple ?s ?p ?o)))
    (graph <http://b>
     (bgp (triple ?s0 ?p0 ?o0)))
    (&& (&& (sameTerm ?s ?s0) (sameTerm ?p ?p0)) (sameTerm ?o ?o0)))))


The intent of the query is to calculate the delta of the graphs I.e. the
triples that are present in <http://a> that are not present in <http://b>.
  So given two identical graphs it was intended to return 0 results,
however the behaviour in dotNetRDF is that it returns 2 results whereas
ARQ returns 0 results.

My belief was that dotNetRDF is correct and I'll explain why, I think I
may be wrong and if so I'd love to understand why.  My understanding of
the flow of execution is as follows:

Step 1 - Execute the LHS of the left join which finds all triples in graph
<http://a> and thus returns the following:

s = r1, p = r1, o = r1
s = r2, p = r2, o = r2

Step 2 - Execute the RHS of the left join which finds all triples in graph
<http://b> and thus returns the following:

s0 = r1, p0 = r1, o0 = r1
s0 = r2, p0 = r2, o0 = r2


Step 3 - Calculate the possible join

s = r1, p = r1, o = r1, s0 = r1, p0 = r1, o0 = r1
s = r1, p = r1, o = r1, s0 = r2, p0 = r2, o0 = r2
s = r2, p = r2, o = r2, s0 = r1, p0 = r1, o0 = r1
s = r2, p = r2, o = r2, s0 = r2, p0 = r2, o0 = r2


Step 4 - Apply the filter on the left join


s = r1, p = r1, o = r1
s = r1, p = r1, o = r1, s0 = r2, p0 = r2, o0 = r2
s = r2, p = r2, o = r2
s = r2, p = r2, o = r2, s0 = r2, p0 = r2, o0 = r2

This I think is where ARQ and dotNetRDF differ in behaviour and where I
suspect my implementation is wrong.  For the rows where FILTER fails for
some (but not all rows) I retain the LHS whereas ARQ does not.  I'm
guessing that I'm missing some bit of the SPARQL specification for
LeftJoin that says that where there is at least one valid joinable
solution for a LHS solution then the LHS does not need to be preserved on
its own?


If you could point me to this I would much appreciate this.

Step 5 - Apply the outer filter


s0 = r1, p0 = r1, o0 = r1
s0 = r2, p0 = r2, o0 = r2


So dotNetRDF returns 2 results but ARQ returns 0 results for this query.
Am I correct in thinking I've got a bug in my LeftJoin implementation over
in dotNetRDF?  Or is this actually a subtle bug in ARQ?

Thanks,

Rob

p.s. code for this test case and variations on it is committed as
TestGraphDeltas

[1] http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=386





Reply via email to