[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715539#comment-17715539 ] Oleksandr Porunov commented on TINKERPOP-2490: -- Maybe I'm missing something, but I doubt that it's `CountStrategy. Even without `count` step at all (let's replace it with `has` step for example) the behavior will be the same. `has` step will be executed for 2 vertices even so, `limit` step says that `1` is enough. Could be I'm missing some logic behind those step usages, but in my understanding the filter query should be executed for a single vertex if that vertex matches the filter and the limit is reached. I.e. `V(v1, v2, v3) .where(${some_filter_which_is_always_true}).limit(1)` in my testing `some_filter_which_is_always_true` is executed 2 times, but I would expect it to be executed only once because limit says that we don't need more elements then 1. As for count query above, notice that I use `.gte` and not `gt`. Thus it should be grater or equal and thus `1` should satisfy the requirements. I don't see a point of checking a second element to find out that the amount of elements is more if equal to 1 satisfies the requirements as well. Nevertheless, my issue isn't in the Count query, but in the filter step which is executed more than needed (from my point of view). Here is a simplified version of the test: {code:java} @Test public void testLimitedFilterNotChecksElementsOverTheLimit(){ TinkerGraph tinkerGraph = TinkerGraph.open(); List vertices = new ArrayList<>(); for(int i=0; i<3; i++){ Vertex v1 = tinkerGraph.addVertex(); Vertex v2 = tinkerGraph.addVertex(); v1.addEdge("connects", v2); vertices.add(v1); } TraversalMetrics traversalMetrics = tinkerGraph.traversal() .V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.inject(true)) .limit(1) .profile().next(); Long filterTraversalCount = traversalMetrics.getMetrics().stream().filter(metrics -> metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName())) .findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID); Assertions.assertEquals(1, filterTraversalCount); } {code} > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715508#comment-17715508 ] Valentyn Kahamlyk commented on TINKERPOP-2490: -- this behavior is part of `CountStrategy` optimization and it looks correct. how can you read only 1 vertex to know that there are less than 2 of them? > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715490#comment-17715490 ] Oleksandr Porunov commented on TINKERPOP-2490: -- This bug still exists in TinkerPop (checked version 3.6.2). An example test which shows when this behavior is not expected is below: {code:java} @Test public void testLimitedFilterNotChecksElementsOverTheLimit(){ TinkerGraph tinkerGraph = TinkerGraph.open(); List vertices = new ArrayList<>(); for(int i=0; i<3; i++){ Vertex v1 = tinkerGraph.addVertex(); Vertex v2 = tinkerGraph.addVertex(); v1.addEdge("connects", v2); vertices.add(v1); } tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.out("connects").count().is(P.gte(1))).limit(1).toList(); TraversalMetrics traversalMetrics = tinkerGraph.traversal() .V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.out("connects").count().is(P.gte(1))) .limit(1) .profile().next(); Long filterTraversalCount = traversalMetrics.getMetrics().stream().filter(metrics -> metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName())) .findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID); Assertions.assertEquals(1, filterTraversalCount); } {code} In the above test the filter first checks the first provided vertex (which runs some potentially expensive check) then when limit is satisfied the check is executed again for the second provided vertex (even so it's unnecessary). The 3rd vertex is not evaluated. Thus, basically, we execute this computation always for 1 more unnecessary vertex. > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266006#comment-17266006 ] ASF GitHub Bot commented on TINKERPOP-2490: --- spmallette commented on pull request #1370: URL: https://github.com/apache/tinkerpop/pull/1370#issuecomment-760915848 Unfortunately, I've had to revert this change and re-open the issue in JIRA: 46fc7f627c75e016688727551f211c432af5a0d3 I've commented in JIRA on the reasoning for why. I sense this is not a big deal to fix, but didn't want to risk including the change given our short window for release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Assignee: Stephen Mallette >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252794#comment-17252794 ] ASF GitHub Bot commented on TINKERPOP-2490: --- spmallette merged pull request #1370: URL: https://github.com/apache/tinkerpop/pull/1370 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250529#comment-17250529 ] ASF GitHub Bot commented on TINKERPOP-2490: --- spmallette commented on pull request #1370: URL: https://github.com/apache/tinkerpop/pull/1370#issuecomment-746798499 Thank you for this contribution. VOTE +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247744#comment-17247744 ] ASF GitHub Bot commented on TINKERPOP-2490: --- junshiguo opened a new pull request #1370: URL: https://github.com/apache/tinkerpop/pull/1370 https://issues.apache.org/jira/browse/TINKERPOP-2490 In FilterStep, the processNextStart() method will first retrieve next traverser and then apply filtering logic. But for RangleGlobalStep, if high limit is already hit, there will be no need to get next traverser. e.g. g.V().limit(1). This query will touch 2 vertices although only 1 vertex will be returned. This PR added high limit check before retrieving next traverser for filtering. Functionality is not affected, but we can expect better performance if getting next traverser is heavy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247743#comment-17247743 ] ASF GitHub Bot commented on TINKERPOP-2490: --- junshiguo closed pull request #1370: URL: https://github.com/apache/tinkerpop/pull/1370 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit
[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247711#comment-17247711 ] ASF GitHub Bot commented on TINKERPOP-2490: --- junshiguo opened a new pull request #1370: URL: https://github.com/apache/tinkerpop/pull/1370 https://issues.apache.org/jira/browse/TINKERPOP-2490 In FilterStep, the processNextStart() method will first retrieve next traverser and then apply filtering logic. But for RangleGlobalStep, if high limit is already hit, there will be no need to get next traverser. e.g. g.V().limit(1). This query will touch 2 vertices although only 1 vertex will be returned. This PR added high limit check before retrieving next traverser for filtering. Functionality is not affected, but we can expect better performance if getting next traverser is heavy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RangeGlobalStep touches next traverser when high limit is already hit > - > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.4.8 >Reporter: Guo Junshi >Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin processNextStart() { > while (true) { > final Traverser.Admin traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.3.4#803005)