[ 
https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457267#comment-16457267
 ] 

ASF GitHub Bot commented on JENA-1534:
--------------------------------------

Github user kinow commented on the issue:

    https://github.com/apache/jena/pull/409
  
    Learning a bit more of Jena internals (weather is really bad here in the 
antipodes :-) ). So picked this PR to review, so that I could learn more about 
Jena internals.
    
    Had a look at the previous tickets, and from JENA-1534, looks like the 
query was not considering a variable for a join, I think.
    
    So created an empty persistent dataset in TDB, loaded the `books.ttl` from 
Jena code, and the `yso.ttl` from SKOSMOS in the yso graph (suspected it would 
have some blank nodes, etc).
    
    Tried the query 
    
    ```sql
    SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p 
?V0 } } }
    ```
    
    And it returned few results, quite quickly. So went and tried `tdbquery` 
from TDB (not TDB2). With following command line (actually in Eclipse, but same 
as):
    
    ```shell
    tdbquery --explain 
--loc=/home/kinow/Development/java/jena/jena/jena-fuseki2/jena-fuseki-core/run/databases/p1/
 "SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0 
} } }"
    ```
    
    Adding also `-Dlog4j.configuration=file:///tmp/log4j.properties 
-Dlog4j.debug=true` to see the explain output.
    
    With the code from master, it took <2 seconds to execute the query, and 
produced the following algebra:
    
    ```sql
    12:36:03 INFO  exec                      :: QUERY
      SELECT  *
      WHERE
        { ?s  ?p  ?V0
          GRAPH ?g
            { ?sx  ?p  ?ox
              FILTER EXISTS { _:b0  ?p  ?V0 }
            }
        }
    12:36:03 INFO  exec                      :: ALGEBRA
      (sequence
        (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
        (filter (exists
                   (quadpattern (quad ?g ??0 ?p ?V0)))
          (quadpattern (quad ?g ?sx ?p ?ox))))
    ```
    
    So that was a sequence (related to `OpSequence` in Jena, which I'm using to 
search in Eclipse for occurrences to see how it's used). Checked out the branch.
    
    ```shell
    $ git fetch --all
    $ git fetch github refs/pull/409/head:pr-409
    $ git checkout pr-409
    $ mvn clean install -Pdev -DskipTests
    ```
    
    Did the `tdbquery` command again, now the algebra became:
    
    ```sql
    12:41:48 INFO  exec                      :: QUERY
      SELECT  *
      WHERE
        { ?s  ?p  ?V0
          GRAPH ?g
            { ?sx  ?p  ?ox
              FILTER EXISTS { _:b0  ?p  ?V0 }
            }
        }
    12:41:48 INFO  exec                      :: ALGEBRA
      (join
        (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
        (filter (exists
                   (quadpattern (quad ?g ??0 ?p ?V0)))
          (quadpattern (quad ?g ?sx ?p ?ox))))
    12:41:48 INFO  exec                      :: TDB
      (join
        (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
        (filter (exists
                   (quadpattern (quad ?g ??0 ?p ?V0)))
          (quadpattern (quad ?g ?sx ?p ?ox))))
    ```
    
    A join ! If I understood the tickets, that's the exactly intended 
behaviour, as before the variable in the exist was not being taken into 
consideration to produce a `JOIN` (`OpJoin`). The query also took much longer, 
>10 seconds (I guess SKOSMOS' YSO vocab has something like ~220K triples? The 
Harry Potter books dataset should have like 5? anywho).
    
    So +1 ! LGTM :tada: 
    
    Will probably spend some time reading more of the code base to see if I can 
learn a bit more. And found two posts 
([1](https://gregheartsfield.com/2012/08/26/jena-arq-query-performance.html) 
[2](https://www.slideshare.net/olafhartig/the-semantics-of-sparql) with some 
interesting content. In case you have any other pointers to learn more about 
it, or if I said anything silly, feel free to correct/share, please :-)
    
    *ps: while reading the javadocs of the `Op*` classes, noticed some typos in 
OpSequence. Should I open a pull request for that, or just commit to master?*
    
    ```java
    -/** A "sequence" is a join-like operation where it is know that the 
    - * the output of one step can be fed into the input of the next 
    +/** A "sequence" is a join-like operation where it is known that
    + * the output of one step can be fed into the input of the next
      * (that is, no scoping issues arise). */
     
     public class OpSequence extends OpN
    ```


> Variables in EXISTS must be considered for the join strategy
> ------------------------------------------------------------
>
>                 Key: JENA-1534
>                 URL: https://issues.apache.org/jira/browse/JENA-1534
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.7.0
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 3.8.0
>
>
> This query has a join between the GRAPH and the pattern before it.
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V0
>     GRAPH ?g 
>       { ?sx  ?p  ?ox
>         FILTER EXISTS { _:b0  ?p  ?V0 }
>       }
>   }
> {noformat}
> The fact {{?V0}} occurs in the LHS of the join in {{?s  ?p  ?V0}} and in the 
> FILTER but not in the rest of the RHS means the "sequence" transform can not 
> be used.
> Contrast to:
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V1
>     GRAPH ?g 
>       { ?sx  ?p  ?ox
>         FILTER EXISTS { _:b0  ?p  ?V2 }
>       }
>   }
> {noformat}
> Now {{?V2}} is only in the FILTER so it is safe to transform the join.
> Note that {{?p}} appears in LHS so making it defined in the EXISTS and the 
> sequence transfrom is possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to