[ https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475 ]
Claus Stadler edited comment on JENA-2107 at 5/17/21, 11:17 PM: ---------------------------------------------------------------- For the Dataset-based implementation we could subclass the find methods of DatasetGraphWrapper to keep track of the internal iterator sizes. After running a query on such an dataset instance one could then check whether only a specific number of tuples have been touched Alternatively, one could track the arguments passed to find and check whether those match an expected sequence (or set) of reference arguments - which would be more traceable than mere counts. Sketch: {code:java} class TrackingDatasetGraph extends DatasetGraphWrapper { protected long numSeenTuples = 0; protected Collection<?> seenArgs = new LinkedHashSet<>(); // or ArrayList @Override public Iterator<Quad> find(Node g, Node s, Node p, Node o) { seenArgs.add(Arrays.asList(g, s, p, o)); Iterator<Quad> it = getR().find() List<Quad> materialized = Iter.toList(it); numSeenTuples += materialized.size(); return materialized.iterator(); } } {code} It's just somewhat cumbersome having to repeat the same pattern for NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is for testing these internals. Having at least a single test case would already be beneficial for detecting regressions in this regard while work on RDF star progresses. was (Author: aklakan): For the Dataset-based implementation we could subclass the find methods of DatasetGraphWrapper to keep track of the internal iterator sizes. After running a query on such an dataset instance one could then check whether only a specific number of tuples have been touched Alternatively, one could track the arguments passed to find and check whether those match an expected sequence (or set) of reference arguments - which would be more traceable than mere counts. Sketch: {code:java} class TrackingDatasetGraph extends DatasetGraphWrapper { protected long numSeenTuples = 0; protected Collection<?> seenArgs = new LinkedHashSet<>(); // or ArrayList @Override public Iterator<Quad> find(Node g, Node s, Node p, Node o) { seenArgs.add(Arrays.asList(g, s, p, o)); try { Iterator<Quad> it = getR().find() List<Quad> materialized = Iter.toList(it); numSeenTuples += materialized.size(); return materialized.iterator(); } } {code} It's just somewhat cumbersome having to repeat the same pattern for NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is for testing these internals. Having at least a single test case would already be beneficial for detecting regressions in this regard while work on RDF star progresses. > RDF Star performance issue with non-concrete node triples > --------------------------------------------------------- > > Key: JENA-2107 > URL: https://issues.apache.org/jira/browse/JENA-2107 > Project: Apache Jena > Issue Type: Improvement > Components: ARQ > Affects Versions: Jena 3.17.0, Jena 4.0.0 > Reporter: Lorenz Bühmann > Priority: Critical > Fix For: Jena 4.1.0 > > > the following graph pattern is not evaluated efficiently (results in > full-scan per binding) because the second triple pattern doesn't take > advantage of the bindings generated by evaluation of the first one: > {code:java} > ?s <p> ?o . > << ?s <p> ?o >> <p2> ?v . > {code} > A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class > > [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71] > by changing the beginning to > {code:java} > private static Iterator<Binding> rdfStarTripleSub(Binding input, Triple > xPattern, ExecutionContext execCxt) { > Triple tPattern = Substitute.substitute(xPattern, input); > {code} > We went from 75s for a very small dataset (50k triples) to near instant > response times. > If this fix is correct and doesn't break anything, it might be the same way > to fix for its quads counterpart in {{SolverRX4}} class. > > Note, for tdbquery, this seems to be evaluated at a different place? At > least, we couldn't find any performance improvement, it's still horribly slow. -- This message was sent by Atlassian Jira (v8.3.4#803005)