[ https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457267#comment-16457267 ]
ASF GitHub Bot commented on JENA-1534: -------------------------------------- Github user kinow commented on the issue: https://github.com/apache/jena/pull/409 Learning a bit more of Jena internals (weather is really bad here in the antipodes :-) ). So picked this PR to review, so that I could learn more about Jena internals. Had a look at the previous tickets, and from JENA-1534, looks like the query was not considering a variable for a join, I think. So created an empty persistent dataset in TDB, loaded the `books.ttl` from Jena code, and the `yso.ttl` from SKOSMOS in the yso graph (suspected it would have some blank nodes, etc). Tried the query ```sql SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0 } } } ``` And it returned few results, quite quickly. So went and tried `tdbquery` from TDB (not TDB2). With following command line (actually in Eclipse, but same as): ```shell tdbquery --explain --loc=/home/kinow/Development/java/jena/jena/jena-fuseki2/jena-fuseki-core/run/databases/p1/ "SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0 } } }" ``` Adding also `-Dlog4j.configuration=file:///tmp/log4j.properties -Dlog4j.debug=true` to see the explain output. With the code from master, it took <2 seconds to execute the query, and produced the following algebra: ```sql 12:36:03 INFO exec :: QUERY SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0 } } } 12:36:03 INFO exec :: ALGEBRA (sequence (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0)) (filter (exists (quadpattern (quad ?g ??0 ?p ?V0))) (quadpattern (quad ?g ?sx ?p ?ox)))) ``` So that was a sequence (related to `OpSequence` in Jena, which I'm using to search in Eclipse for occurrences to see how it's used). Checked out the branch. ```shell $ git fetch --all $ git fetch github refs/pull/409/head:pr-409 $ git checkout pr-409 $ mvn clean install -Pdev -DskipTests ``` Did the `tdbquery` command again, now the algebra became: ```sql 12:41:48 INFO exec :: QUERY SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0 } } } 12:41:48 INFO exec :: ALGEBRA (join (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0)) (filter (exists (quadpattern (quad ?g ??0 ?p ?V0))) (quadpattern (quad ?g ?sx ?p ?ox)))) 12:41:48 INFO exec :: TDB (join (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0)) (filter (exists (quadpattern (quad ?g ??0 ?p ?V0))) (quadpattern (quad ?g ?sx ?p ?ox)))) ``` A join ! If I understood the tickets, that's the exactly intended behaviour, as before the variable in the exist was not being taken into consideration to produce a `JOIN` (`OpJoin`). The query also took much longer, >10 seconds (I guess SKOSMOS' YSO vocab has something like ~220K triples? The Harry Potter books dataset should have like 5? anywho). So +1 ! LGTM :tada: Will probably spend some time reading more of the code base to see if I can learn a bit more. And found two posts ([1](https://gregheartsfield.com/2012/08/26/jena-arq-query-performance.html) [2](https://www.slideshare.net/olafhartig/the-semantics-of-sparql) with some interesting content. In case you have any other pointers to learn more about it, or if I said anything silly, feel free to correct/share, please :-) *ps: while reading the javadocs of the `Op*` classes, noticed some typos in OpSequence. Should I open a pull request for that, or just commit to master?* ```java -/** A "sequence" is a join-like operation where it is know that the - * the output of one step can be fed into the input of the next +/** A "sequence" is a join-like operation where it is known that + * the output of one step can be fed into the input of the next * (that is, no scoping issues arise). */ public class OpSequence extends OpN ``` > Variables in EXISTS must be considered for the join strategy > ------------------------------------------------------------ > > Key: JENA-1534 > URL: https://issues.apache.org/jira/browse/JENA-1534 > Project: Apache Jena > Issue Type: Bug > Components: ARQ > Affects Versions: Jena 3.7.0 > Reporter: Andy Seaborne > Assignee: Andy Seaborne > Priority: Major > Fix For: Jena 3.8.0 > > > This query has a join between the GRAPH and the pattern before it. > {noformat} > SELECT * > WHERE > { ?s ?p ?V0 > GRAPH ?g > { ?sx ?p ?ox > FILTER EXISTS { _:b0 ?p ?V0 } > } > } > {noformat} > The fact {{?V0}} occurs in the LHS of the join in {{?s ?p ?V0}} and in the > FILTER but not in the rest of the RHS means the "sequence" transform can not > be used. > Contrast to: > {noformat} > SELECT * > WHERE > { ?s ?p ?V1 > GRAPH ?g > { ?sx ?p ?ox > FILTER EXISTS { _:b0 ?p ?V2 } > } > } > {noformat} > Now {{?V2}} is only in the FILTER so it is safe to transform the join. > Note that {{?p}} appears in LHS so making it defined in the EXISTS and the > sequence transfrom is possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)