[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346211#comment-17346211
 ] 

Andy Seaborne edited comment on JENA-2107 at 5/17/21, 3:04 PM:
---

{{tdbquery}} processing will be in SolverRX (one each for TDB1, TDB2).




was (Author: andy.seaborne):
{{tdbquery}} with be in SolverRX (one each for TDB1, TDB2).



> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346371#comment-17346371
 ] 

Andy Seaborne edited comment on JENA-2107 at 5/17/21, 10:32 PM:


All the solvers need this fix. 

I've done (locally) RX4 and the TDB SolverRX's following the pattern from the 
PR.

[~LorenzB] if you adjust the commit message on the PR, I'll do the other three. 
Would you be able to test the TDB (either one, ideally both)? I can't think of 
a test for the situation other than timing because the core of the RDF-star 
triple solver code is, by design, more general than assuming substitution has 
been done, which is why it didn't show except from a performance effect.


was (Author: andy.seaborne):
All the solvers need this fix. 

I've don RX4 and the TDB SolverRX's following the pattern from the PR.

[~LorenzB] if you adjust he commit message on the PR, I'll do the other three. 
Would you be able to test the TDB (either one, ideally both)? I can't think of 
a test for the situation other than timing because the core of the RDF-star 
triple solver code is, by design, more general than assuming substitution has 
been done, which is why it didn't show except from a performance effect.

> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Claus Stadler (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475
 ] 

Claus Stadler edited comment on JENA-2107 at 5/17/21, 11:16 PM:


For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such an dataset instance one could then check whether only a 
specific number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  try {
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.







was (Author: aklakan):
For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such an dataset instance one could then check whether only a 
specific number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  try {
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper).

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.






> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Claus Stadler (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475
 ] 

Claus Stadler edited comment on JENA-2107 at 5/17/21, 11:17 PM:


For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such an dataset instance one could then check whether only a 
specific number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.







was (Author: aklakan):
For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such an dataset instance one could then check whether only a 
specific number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  try {
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.






> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Claus Stadler (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475
 ] 

Claus Stadler edited comment on JENA-2107 at 5/17/21, 11:17 PM:


For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such a dataset instance one could then check whether only a specific 
number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.







was (Author: aklakan):
For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such an dataset instance one could then check whether only a 
specific number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.






> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Claus Stadler (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475
 ] 

Claus Stadler edited comment on JENA-2107 at 5/17/21, 11:22 PM:


For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such a dataset instance one could then check whether only a specific 
number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome if we had to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

I know that TDB2 has a DatasetGraph abstraction - and I am assuming TDB1 has 
one too - so above sketch might already be sufficient to test all RDF star 
implementations.


Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.







was (Author: aklakan):
For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such a dataset instance one could then check whether only a specific 
number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.






> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-17 Thread Claus Stadler (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475
 ] 

Claus Stadler edited comment on JENA-2107 at 5/17/21, 11:25 PM:


For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such a dataset instance one could then check whether only a specific 
number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome having to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this part of 
the engine is for testing these internals.

Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.







was (Author: aklakan):
For the Dataset-based implementation we could subclass the find methods of 
DatasetGraphWrapper to keep track of the internal iterator sizes. After running 
a query on such a dataset instance one could then check whether only a specific 
number of tuples have been touched

Alternatively, one could track the arguments passed to find and check whether 
those match an expected sequence (or set) of reference arguments - which would 
be more traceable than mere counts.


Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection seenArgs = new LinkedHashSet<>();  // or ArrayList

@Override
public Iterator find(Node g, Node s, Node p, Node o) {
  seenArgs.add(Arrays.asList(g, s, p, o));
  Iterator it = getR().find()
  List materialized = Iter.toList(it);
  numSeenTuples += materialized.size();
  return materialized.iterator();
  }
}
{code}

It's just somewhat cumbersome if we had to repeat the same pattern for 
NodeTupleTable(Wrapper) ; and I am not sure about how accessible this engine is 
for testing these internals.

I know that TDB2 has a DatasetGraph abstraction - and I am assuming TDB1 has 
one too - so above sketch might already be sufficient to test all RDF star 
implementations.


Having at least a  single test case would already be beneficial for detecting 
regressions in this regard while work on RDF star progresses.






> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-2107) RDF Star performance issue with non-concrete node triples

2021-05-18 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346874#comment-17346874
 ] 

Andy Seaborne edited comment on JENA-2107 at 5/18/21, 1:16 PM:
---

Indexing: certainly it can be added. I've kept away from changes that 
change-of-disk-layout. Disk changes are a more permanent commitment. Adding is 
a data reload; withdrawing/changing the feature is a reload and disruption.

The first {{RDF*}} Jena implementation was a bit PG mode and a bit SA mode (PG 
mode - the triple is always also an asserted triple like annotation syntax). It 
exploited the existing indexes to look up {{<<...>>}} patterns up.

The current {{RDF-star}} compliant is code, no disk-changes. With the fix for 
this JIRA, use of annotation syntax should be reasonable (the asserted triple 
will come first)

>From the current state (4.1.0 onwards), functionally correct and complete, we 
>can see what the user-uptake is. 

One thing to avoid is making a change, then needing to make another change, ... 
and another ... for a feature that not everyone is going to use.

The index setup is currently fixed - we can change it to look for additional 
indexes and have a tool to adding indexes.

The bulk loaders need adjusting - TDB2 bulk loaders do work incrementally and 
make a difference adding a lot of data (comparable to the size of the existing 
data - if smaller, little point using them).

 


was (Author: andy.seaborne):
Indexing: certainly it can be added. I've kept away from changes that 
change-of-disk-layout. Disk changes are a more permanent commitment. Adding is 
a data reload; withdrawing/changing the feature is a reload and disruption.

The first {{RDF*}} Jena implementation was a bit PG mode and a bit SA mode (PG 
mode - the triple is always also an asserted triple like annotation syntax). It 
exploited the existing indexes to look up {{<<...>>}} patterns up.

The current \{{RDF-star} compliant is code, no disk-changes. With the fix for 
this JIRA, use of annotation syntax should be reasonable (the asserted triple 
will come first)

>From the current state (4.1.0 onwards), functionally correct and complete, we 
>can see what the user-uptake is. 

One thing to avoid is making a change, then needing to make another change, ... 
and another ... for a feature that not everyone is going to use.

The index setup is currently fixed - we can change it to look for additional 
indexes and have a tool to adding indexes.

The bulk loaders need adjusting - TDB2 bulk loaders do work incrementally and 
make a difference adding a lot of data (comparable to the size of the existing 
data - if smaller, little point using them).

 

> RDF Star performance issue with non-concrete node triples
> -
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.17.0, Jena 4.0.0
>Reporter: Lorenz Bühmann
>Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s  ?o .  
> << ?s  ?o >>  ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)