[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2023-04-23 Thread Oleksandr Porunov (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715539#comment-17715539
 ] 

Oleksandr Porunov commented on TINKERPOP-2490:
--

Maybe I'm missing something, but I doubt that it's `CountStrategy. Even without 
`count` step at all (let's replace it with `has` step for example) the behavior 
will be the same. `has` step will be executed for 2 vertices even so, `limit` 
step says that `1` is enough. Could be I'm missing some logic behind those step 
usages, but in my understanding the filter query should be executed for a 
single vertex if that vertex matches the filter and the limit is reached.

I.e. `V(v1, v2, v3) .where(${some_filter_which_is_always_true}).limit(1)` in my 
testing `some_filter_which_is_always_true` is executed 2 times, but I would 
expect it to be executed only once because limit says that we don't need more 
elements then 1.

As for count query above, notice that I use `.gte` and not `gt`. Thus it should 
be grater or equal and thus `1` should satisfy the requirements. I don't see a 
point of checking a second element to find out that the amount of elements is 
more if equal to 1 satisfies the requirements as well. Nevertheless, my issue 
isn't in the Count query, but in the filter step which is executed more than 
needed (from my point of view).

Here is a simplified version of the test:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
TinkerGraph tinkerGraph = TinkerGraph.open();
List vertices = new ArrayList<>();
for(int i=0; i<3; i++){
Vertex v1 = tinkerGraph.addVertex();
Vertex v2 = tinkerGraph.addVertex();
v1.addEdge("connects", v2);
vertices.add(v1);
}

TraversalMetrics traversalMetrics = tinkerGraph.traversal()
.V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.inject(true))
.limit(1)
.profile().next();

Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
.findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

Assertions.assertEquals(1, filterTraversalCount);
} {code}
 

> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2023-04-23 Thread Valentyn Kahamlyk (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715508#comment-17715508
 ] 

Valentyn Kahamlyk commented on TINKERPOP-2490:
--

this behavior is part of `CountStrategy` optimization and it looks correct. 
how can you read only 1 vertex to know that there are less than 2 of them?

> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2023-04-23 Thread Oleksandr Porunov (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715490#comment-17715490
 ] 

Oleksandr Porunov commented on TINKERPOP-2490:
--

This bug still exists in TinkerPop (checked version 3.6.2). An example test 
which shows when this behavior is not expected is below:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
TinkerGraph tinkerGraph = TinkerGraph.open();
List vertices = new ArrayList<>();
for(int i=0; i<3; i++){
Vertex v1 = tinkerGraph.addVertex();
Vertex v2 = tinkerGraph.addVertex();
v1.addEdge("connects", v2);
vertices.add(v1);
}
tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1))).limit(1).toList();

TraversalMetrics traversalMetrics = tinkerGraph.traversal()
.V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1)))
.limit(1)
.profile().next();

Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
.findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

Assertions.assertEquals(1, filterTraversalCount);
} {code}
In the above test the filter first checks the first provided vertex (which runs 
some potentially expensive check) then when limit is satisfied the check is 
executed again for the second provided vertex (even so it's unnecessary). The 
3rd vertex is not evaluated.

Thus, basically, we execute this computation always for 1 more unnecessary 
vertex.

> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2021-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266006#comment-17266006
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

spmallette commented on pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370#issuecomment-760915848


   Unfortunately, I've had to revert this change and re-open the issue in JIRA: 
46fc7f627c75e016688727551f211c432af5a0d3 
   
   I've commented in JIRA on the reasoning for why. I sense this is not a big 
deal to fix, but didn't want to risk including the change given our short 
window for release.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Assignee: Stephen Mallette
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252794#comment-17252794
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

spmallette merged pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250529#comment-17250529
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

spmallette commented on pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370#issuecomment-746798499


   Thank you for this contribution. VOTE +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247744#comment-17247744
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

junshiguo opened a new pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370


   https://issues.apache.org/jira/browse/TINKERPOP-2490
   
   In FilterStep, the processNextStart() method will first retrieve next 
traverser and then apply filtering logic. But for RangleGlobalStep, if high 
limit is already hit, there will be no need to get next traverser.
   
   e.g. g.V().limit(1). This query will touch 2 vertices although only 1 vertex 
will be returned.
   
   This PR added high limit check before retrieving next traverser for 
filtering. Functionality is not affected, but we can expect better performance 
if getting next traverser is heavy.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247743#comment-17247743
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

junshiguo closed pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247711#comment-17247711
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

junshiguo opened a new pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370


   https://issues.apache.org/jira/browse/TINKERPOP-2490
   
   In FilterStep, the processNextStart() method will first retrieve next 
traverser and then apply filtering logic. But for RangleGlobalStep, if high 
limit is already hit, there will be no need to get next traverser.
   
   e.g. g.V().limit(1). This query will touch 2 vertices although only 1 vertex 
will be returned.
   
   This PR added high limit check before retrieving next traverser for 
filtering. Functionality is not affected, but we can expect better performance 
if getting next traverser is heavy.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)