[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements

2023-05-21 Thread Miracy Cavendish (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724653#comment-17724653
 ] 

Miracy Cavendish commented on TINKERPOP-2878:
-

Sorry, we just noticed it now and are responding. Many thanks for your 
explanation and updating the document.



> Incorrect handling of local operations when there are duplicate elements
> 
>
> Key: TINKERPOP-2878
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2878
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.6.2
>Reporter: Miracy Cavendish
>Assignee: Stephen Mallette
>Priority: Critical
> Fix For: 3.7.0, 3.6.3, 3.5.6
>
>
> When using “local” to query the vertex with maximum out-degree among 
> vertices, there is a different result between using “dedup()” and without 
> “dedup()”.
> {code:java}
> Gremlin1: g.V().both().local(__.out().count()).max()
> Result1: 280
> Gremlin2: g.V().both().dedup().local(__.out().count()).max()
> Result2: 14{code}
> _Result1_ should equal _Result2_ according to the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  “Local provides a execute a specified traversal on a single element within a 
> stream.”, whereas 280 ≠ 14.
> The possible reason is that the database does not handle the bulked data 
> correctly when there are duplicate elements: for the reduced data  
> (There are x vertices with ID v), the database will map it to x * 
> out(v).count() (a number) instead of , which results in 
> inconsistency.
> We noticed that there is an example of “local” provided by the ["Tinkerpop 
> Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], 
> which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can 
> not obtain the correct result in the provided case since _“local”_ propagates 
> the traverser through the internal traversal as is without splitting/cloning 
> it.
> Nevertheless, the results of the current execution of the above statement 
> also contradict the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  but could be alleviated. Therefore, we suggest using a second way 
>  of handling reduced data to alleviate this situation.
> The graph is created by the following statements:
> {code:java}
> g.addV("Vlabel1").property("prop11", true).property("prop26", 
> -1.3054643785208727e+18).property("prop3", 
> 5955883311802481410).property("PersonalId", 1)
> g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", 
> Double.POSITIVE_INFINITY).property("prop29", 
> -8.088511244487521e+18).property("prop1", 
> -791166414100353228).property("prop10", Double.NaN).property("prop20", 
> false).property("prop12", -1.611044197269977e+18).property("prop8", 
> Double.POSITIVE_INFINITY).property("prop28", 
> "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", 
> 122416389).property("prop4", -133008224708918302).property("prop16", 
> Double.POSITIVE_INFINITY).property("prop5", 
> 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", 
> 2)
> g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", 
> false).property("prop20", true).property("prop28", "Eb").property("prop26", 
> Double.POSITIVE_INFINITY).property("prop19", 
> "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", 
> "emdyKI4gibcntwr9xr1R").property("prop8", 
> 1.6766837870245322e+18).property("prop6", 
> "KPvJU8zUZkDujXO5").property("prop5", 
> Double.POSITIVE_INFINITY).property("prop16", 
> Double.NEGATIVE_INFINITY).property("prop29", 
> -6.379213156782167e+16).property("prop9", 
> -2639063587618099127).property("prop2", 
> -4223871862589164789).property("prop7", true).property("prop22", 
> 3.3866441258784246e+18).property("prop12", 
> Double.NEGATIVE_INFINITY).property("prop15", 
> Double.POSITIVE_INFINITY).property("prop27", -811138702).property("prop18", 
> -823086061).property("prop30", 1766879986).property("prop10", 
> Double.NEGATIVE_INFINITY).property("prop25", true).property("prop17", 
> -7.221182960918364e+17).property("prop3", 
> 3709150069759562136).property("prop24", true).property("prop23", 
> 2089722858).property("prop1", 4952669033574350283).property("PersonalId", 3)
> g.addV("Vlabel4").property("prop18", 1921954359).property("prop9", 
> 3679390972557414017).property("prop28", "zea").property("prop5", 
> -5.37655340395617e+18).property("prop23", 873631855).property("prop29", 
> 

[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements

2023-03-02 Thread Miracy Cavendish (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695713#comment-17695713
 ] 

Miracy Cavendish commented on TINKERPOP-2878:
-

Thank you very much for considering improvements to the documents in the future.

Your example is nice and clear, so I think I have understood your explanation.

The contradiction is about the
{code:java}
localdefault  GraphTraversal local(Traversal localTraversal)
Provides a execute a specified traversal on a single element within a stream. 
{code}
I would like to kindly suggest that further explanation is necessary regarding 
the term 'single'  causing some confusion.

> Incorrect handling of local operations when there are duplicate elements
> 
>
> Key: TINKERPOP-2878
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2878
> Project: TinkerPop
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Miracy Cavendish
>Priority: Major
>
> When using “local” to query the vertex with maximum out-degree among 
> vertices, there is a different result between using “dedup()” and without 
> “dedup()”.
> {code:java}
> Gremlin1: g.V().both().local(__.out().count()).max()
> Result1: 280
> Gremlin2: g.V().both().dedup().local(__.out().count()).max()
> Result2: 14{code}
> _Result1_ should equal _Result2_ according to the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  “Local provides a execute a specified traversal on a single element within a 
> stream.”, whereas 280 ≠ 14.
> The possible reason is that the database does not handle the bulked data 
> correctly when there are duplicate elements: for the reduced data  
> (There are x vertices with ID v), the database will map it to x * 
> out(v).count() (a number) instead of , which results in 
> inconsistency.
> We noticed that there is an example of “local” provided by the ["Tinkerpop 
> Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], 
> which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can 
> not obtain the correct result in the provided case since _“local”_ propagates 
> the traverser through the internal traversal as is without splitting/cloning 
> it.
> Nevertheless, the results of the current execution of the above statement 
> also contradict the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  but could be alleviated. Therefore, we suggest using a second way 
>  of handling reduced data to alleviate this situation.
> The graph is created by the following statements:
> {code:java}
> g.addV("Vlabel1").property("prop11", true).property("prop26", 
> -1.3054643785208727e+18).property("prop3", 
> 5955883311802481410).property("PersonalId", 1)
> g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", 
> Double.POSITIVE_INFINITY).property("prop29", 
> -8.088511244487521e+18).property("prop1", 
> -791166414100353228).property("prop10", Double.NaN).property("prop20", 
> false).property("prop12", -1.611044197269977e+18).property("prop8", 
> Double.POSITIVE_INFINITY).property("prop28", 
> "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", 
> 122416389).property("prop4", -133008224708918302).property("prop16", 
> Double.POSITIVE_INFINITY).property("prop5", 
> 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", 
> 2)
> g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", 
> false).property("prop20", true).property("prop28", "Eb").property("prop26", 
> Double.POSITIVE_INFINITY).property("prop19", 
> "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", 
> "emdyKI4gibcntwr9xr1R").property("prop8", 
> 1.6766837870245322e+18).property("prop6", 
> "KPvJU8zUZkDujXO5").property("prop5", 
> Double.POSITIVE_INFINITY).property("prop16", 
> Double.NEGATIVE_INFINITY).property("prop29", 
> -6.379213156782167e+16).property("prop9", 
> -2639063587618099127).property("prop2", 
> -4223871862589164789).property("prop7", true).property("prop22", 
> 3.3866441258784246e+18).property("prop12", 
> Double.NEGATIVE_INFINITY).property("prop15", 
> Double.POSITIVE_INFINITY).property("prop27", -811138702).property("prop18", 
> -823086061).property("prop30", 1766879986).property("prop10", 
> Double.NEGATIVE_INFINITY).property("prop25", true).property("prop17", 
> -7.221182960918364e+17).property("prop3", 
> 3709150069759562136).property("prop24", true).property("prop23", 
> 2089722858).property("prop1", 

[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements

2023-03-02 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695703#comment-17695703
 ] 

Stephen Mallette commented on TINKERPOP-2878:
-

I'm not sure I understand the contradiction you are referring to. I'll try to 
explain with another example from the "modern" graph:

{code}
gremlin> g.V().both().flatMap(outE().limit(1))
==>e[10][4-created->5]
==>e[10][4-created->5]
==>e[10][4-created->5]
==>e[9][1-created->3]
==>e[9][1-created->3]
==>e[9][1-created->3]
==>e[12][6-created->3]
gremlin> g.V().both().filter(outE()).count()
==>7
{code}

The {{g.V().both()}} returns duplicates of the same 6 vertices in the graph. 
The above chooses one edge from each traverser found that has outgoing edges. 
you can see the count matches the number shown. If we prefer {{local()}} we get 
different behavior where the selection of a single edge occurs from local to 
each element (not traverser - i.e. no splitting) thus only one edge from the 
three vertices that have possible outgoing edges:

{code}
gremlin> g.V().both().local(outE().limit(1))
==>e[10][4-created->5]
==>e[9][1-created->3]
==>e[12][6-created->3]
{code}

I think we will keep this open to try to clarify the documentation a bit, but 
I'm not sure I see that it's wrong as it is...probably just needs improvements 
and additional examples.

> Incorrect handling of local operations when there are duplicate elements
> 
>
> Key: TINKERPOP-2878
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2878
> Project: TinkerPop
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Miracy Cavendish
>Priority: Major
>
> When using “local” to query the vertex with maximum out-degree among 
> vertices, there is a different result between using “dedup()” and without 
> “dedup()”.
> {code:java}
> Gremlin1: g.V().both().local(__.out().count()).max()
> Result1: 280
> Gremlin2: g.V().both().dedup().local(__.out().count()).max()
> Result2: 14{code}
> _Result1_ should equal _Result2_ according to the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  “Local provides a execute a specified traversal on a single element within a 
> stream.”, whereas 280 ≠ 14.
> The possible reason is that the database does not handle the bulked data 
> correctly when there are duplicate elements: for the reduced data  
> (There are x vertices with ID v), the database will map it to x * 
> out(v).count() (a number) instead of , which results in 
> inconsistency.
> We noticed that there is an example of “local” provided by the ["Tinkerpop 
> Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], 
> which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can 
> not obtain the correct result in the provided case since _“local”_ propagates 
> the traverser through the internal traversal as is without splitting/cloning 
> it.
> Nevertheless, the results of the current execution of the above statement 
> also contradict the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  but could be alleviated. Therefore, we suggest using a second way 
>  of handling reduced data to alleviate this situation.
> The graph is created by the following statements:
> {code:java}
> g.addV("Vlabel1").property("prop11", true).property("prop26", 
> -1.3054643785208727e+18).property("prop3", 
> 5955883311802481410).property("PersonalId", 1)
> g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", 
> Double.POSITIVE_INFINITY).property("prop29", 
> -8.088511244487521e+18).property("prop1", 
> -791166414100353228).property("prop10", Double.NaN).property("prop20", 
> false).property("prop12", -1.611044197269977e+18).property("prop8", 
> Double.POSITIVE_INFINITY).property("prop28", 
> "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", 
> 122416389).property("prop4", -133008224708918302).property("prop16", 
> Double.POSITIVE_INFINITY).property("prop5", 
> 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", 
> 2)
> g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", 
> false).property("prop20", true).property("prop28", "Eb").property("prop26", 
> Double.POSITIVE_INFINITY).property("prop19", 
> "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", 
> "emdyKI4gibcntwr9xr1R").property("prop8", 
> 1.6766837870245322e+18).property("prop6", 
> "KPvJU8zUZkDujXO5").property("prop5", 
> Double.POSITIVE_INFINITY).property("prop16", 
> 

[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements

2023-03-02 Thread Miracy Cavendish (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695663#comment-17695663
 ] 

Miracy Cavendish commented on TINKERPOP-2878:
-

Thank you for your response. I agree that using "map" or "flatMap" would be a 
better choice, and the result of the "local" may be the expected behavior. 
However, in some contexts, we can execute the query more efficiently if we have 
similar operations as in the previous case (using  instead 
of x * out(v).count() ). For instance, in my current scenario, I would like to 
filter out vertices that have an out-degree less than a constant but retain the 
duplicate vertices.

 

Earlier, I thought I could use "local" to achieve this, because the [gremlin 
document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
 mentions that "Local" allows executing a specified traversal on a single 
element within a stream. However, I would appreciate it if you could provide 
more detailed explanations about the bulked traversal in the document, since 
the documents is contradicted with the behavior

 

> Incorrect handling of local operations when there are duplicate elements
> 
>
> Key: TINKERPOP-2878
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2878
> Project: TinkerPop
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Miracy Cavendish
>Priority: Major
>
> When using “local” to query the vertex with maximum out-degree among 
> vertices, there is a different result between using “dedup()” and without 
> “dedup()”.
> {code:java}
> Gremlin1: g.V().both().local(__.out().count()).max()
> Result1: 280
> Gremlin2: g.V().both().dedup().local(__.out().count()).max()
> Result2: 14{code}
> _Result1_ should equal _Result2_ according to the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  “Local provides a execute a specified traversal on a single element within a 
> stream.”, whereas 280 ≠ 14.
> The possible reason is that the database does not handle the bulked data 
> correctly when there are duplicate elements: for the reduced data  
> (There are x vertices with ID v), the database will map it to x * 
> out(v).count() (a number) instead of , which results in 
> inconsistency.
> We noticed that there is an example of “local” provided by the ["Tinkerpop 
> Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], 
> which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can 
> not obtain the correct result in the provided case since _“local”_ propagates 
> the traverser through the internal traversal as is without splitting/cloning 
> it.
> Nevertheless, the results of the current execution of the above statement 
> also contradict the [gremlin 
> document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-]
>  but could be alleviated. Therefore, we suggest using a second way 
>  of handling reduced data to alleviate this situation.
> The graph is created by the following statements:
> {code:java}
> g.addV("Vlabel1").property("prop11", true).property("prop26", 
> -1.3054643785208727e+18).property("prop3", 
> 5955883311802481410).property("PersonalId", 1)
> g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", 
> Double.POSITIVE_INFINITY).property("prop29", 
> -8.088511244487521e+18).property("prop1", 
> -791166414100353228).property("prop10", Double.NaN).property("prop20", 
> false).property("prop12", -1.611044197269977e+18).property("prop8", 
> Double.POSITIVE_INFINITY).property("prop28", 
> "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", 
> 122416389).property("prop4", -133008224708918302).property("prop16", 
> Double.POSITIVE_INFINITY).property("prop5", 
> 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", 
> 2)
> g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", 
> false).property("prop20", true).property("prop28", "Eb").property("prop26", 
> Double.POSITIVE_INFINITY).property("prop19", 
> "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", 
> "emdyKI4gibcntwr9xr1R").property("prop8", 
> 1.6766837870245322e+18).property("prop6", 
> "KPvJU8zUZkDujXO5").property("prop5", 
> Double.POSITIVE_INFINITY).property("prop16", 
> Double.NEGATIVE_INFINITY).property("prop29", 
> -6.379213156782167e+16).property("prop9", 
> 

[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements

2023-03-01 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695181#comment-17695181
 ] 

Stephen Mallette commented on TINKERPOP-2878:
-

i believe this is expected behavior. i wouldn't expect those to provide the 
same result. consider the "modern" graph and the first part of your traversals:

{code}
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().both().local(__.out().count()).max()
==>9
gremlin> g.V().both().dedup().local(__.out().count()).max()
==>3
{code}

taking away the {{max()}} and doing a {{profile()}} yields some insights:

{code}
gremlin> g.V().both().local(__.out().count()).profile()
==>Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur
=
TinkerGraphStep(vertex,[]) 6
   6   0.10719.48
VertexStep(BOTH,vertex)   12
  12   0.09417.15
NoOpBarrierStep(2500) 12
   6   0.09116.58
LocalStep([VertexStep(OUT,edge), CountGlobalStep]) 6
   6   0.25946.79
  VertexStep(OUT,edge)16
   6   0.041
  CountGlobalStep  6
   6   0.103
>TOTAL -
   -   0.553-
gremlin> g.V().both().dedup().local(__.out().count()).profile()
==>Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur
=
TinkerGraphStep(vertex,[]) 6
   6   0.09422.82
VertexStep(BOTH,vertex)   12
  12   0.07518.24
DedupGlobalStep(null,null) 6
   6   0.04611.28
LocalStep([VertexStep(OUT,edge), CountGlobalStep]) 6
   6   0.19647.67
  VertexStep(OUT,edge) 6
   6   0.039
  CountGlobalStep  6
   6   0.055
>TOTAL -
   -   0.412-
{code}

when you {{dedup()}} the {{local()}} step only does a {{count()}} on the edges 
of each of the 6 unique vertices in the graph. when you don't {{dedup()}} then 
{{local()}} processes 16 vertices (duplications of the 6 given traversing over 
{{both()}}) counting each of their edges to the same traverser, so you get the 
edge count multiplied by the bulk of the traverser basically.

I don't think I'd use {{local()}} in this case. I'd probably prefer {{map()}} 
or probably {{dedup().map()}}:

{code}
gremlin> g.V().both().map(__.out().count()).profile()
==>Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur
=
TinkerGraphStep(vertex,[]) 6
   6   0.08422.28
VertexStep(BOTH,vertex)   12
  12   0.07419.60
NoOpBarrierStep(2500) 12
   6   0.07219.14
TraversalMapStep([VertexStep(OUT,edge), CountGl...12
   6   0.14738.98
  VertexStep(OUT,edge) 6
   6   0.029
  CountGlobalStep  6
   6   0.036
>TOTAL -
   -   0.379-
gremlin> g.V().both().dedup().local(__.out().count()).max()
==>3
gremlin> g.V().both().map(__.out().count()).max()
==>3
gremlin> g.V().both().dedup().map(__.out().count()).max()
==>3
{code}

> Incorrect handling of local operations when there are duplicate elements
> 
>
> Key: TINKERPOP-2878
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2878
> Project: TinkerPop
>  Issue Type: Bug
>Affects Versions: 3.6.2
>