[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements
[ https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724653#comment-17724653 ] Miracy Cavendish commented on TINKERPOP-2878: - Sorry, we just noticed it now and are responding. Many thanks for your explanation and updating the document. > Incorrect handling of local operations when there are duplicate elements > > > Key: TINKERPOP-2878 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2878 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.6.2 >Reporter: Miracy Cavendish >Assignee: Stephen Mallette >Priority: Critical > Fix For: 3.7.0, 3.6.3, 3.5.6 > > > When using “local” to query the vertex with maximum out-degree among > vertices, there is a different result between using “dedup()” and without > “dedup()”. > {code:java} > Gremlin1: g.V().both().local(__.out().count()).max() > Result1: 280 > Gremlin2: g.V().both().dedup().local(__.out().count()).max() > Result2: 14{code} > _Result1_ should equal _Result2_ according to the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > “Local provides a execute a specified traversal on a single element within a > stream.”, whereas 280 ≠ 14. > The possible reason is that the database does not handle the bulked data > correctly when there are duplicate elements: for the reduced data > (There are x vertices with ID v), the database will map it to x * > out(v).count() (a number) instead of , which results in > inconsistency. > We noticed that there is an example of “local” provided by the ["Tinkerpop > Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], > which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can > not obtain the correct result in the provided case since _“local”_ propagates > the traverser through the internal traversal as is without splitting/cloning > it. > Nevertheless, the results of the current execution of the above statement > also contradict the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > but could be alleviated. Therefore, we suggest using a second way > of handling reduced data to alleviate this situation. > The graph is created by the following statements: > {code:java} > g.addV("Vlabel1").property("prop11", true).property("prop26", > -1.3054643785208727e+18).property("prop3", > 5955883311802481410).property("PersonalId", 1) > g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", > Double.POSITIVE_INFINITY).property("prop29", > -8.088511244487521e+18).property("prop1", > -791166414100353228).property("prop10", Double.NaN).property("prop20", > false).property("prop12", -1.611044197269977e+18).property("prop8", > Double.POSITIVE_INFINITY).property("prop28", > "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", > 122416389).property("prop4", -133008224708918302).property("prop16", > Double.POSITIVE_INFINITY).property("prop5", > 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", > 2) > g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", > false).property("prop20", true).property("prop28", "Eb").property("prop26", > Double.POSITIVE_INFINITY).property("prop19", > "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", > "emdyKI4gibcntwr9xr1R").property("prop8", > 1.6766837870245322e+18).property("prop6", > "KPvJU8zUZkDujXO5").property("prop5", > Double.POSITIVE_INFINITY).property("prop16", > Double.NEGATIVE_INFINITY).property("prop29", > -6.379213156782167e+16).property("prop9", > -2639063587618099127).property("prop2", > -4223871862589164789).property("prop7", true).property("prop22", > 3.3866441258784246e+18).property("prop12", > Double.NEGATIVE_INFINITY).property("prop15", > Double.POSITIVE_INFINITY).property("prop27", -811138702).property("prop18", > -823086061).property("prop30", 1766879986).property("prop10", > Double.NEGATIVE_INFINITY).property("prop25", true).property("prop17", > -7.221182960918364e+17).property("prop3", > 3709150069759562136).property("prop24", true).property("prop23", > 2089722858).property("prop1", 4952669033574350283).property("PersonalId", 3) > g.addV("Vlabel4").property("prop18", 1921954359).property("prop9", > 3679390972557414017).property("prop28", "zea").property("prop5", > -5.37655340395617e+18).property("prop23", 873631855).property("prop29", >
[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements
[ https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695713#comment-17695713 ] Miracy Cavendish commented on TINKERPOP-2878: - Thank you very much for considering improvements to the documents in the future. Your example is nice and clear, so I think I have understood your explanation. The contradiction is about the {code:java} localdefault GraphTraversal local(Traversal localTraversal) Provides a execute a specified traversal on a single element within a stream. {code} I would like to kindly suggest that further explanation is necessary regarding the term 'single' causing some confusion. > Incorrect handling of local operations when there are duplicate elements > > > Key: TINKERPOP-2878 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2878 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Miracy Cavendish >Priority: Major > > When using “local” to query the vertex with maximum out-degree among > vertices, there is a different result between using “dedup()” and without > “dedup()”. > {code:java} > Gremlin1: g.V().both().local(__.out().count()).max() > Result1: 280 > Gremlin2: g.V().both().dedup().local(__.out().count()).max() > Result2: 14{code} > _Result1_ should equal _Result2_ according to the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > “Local provides a execute a specified traversal on a single element within a > stream.”, whereas 280 ≠ 14. > The possible reason is that the database does not handle the bulked data > correctly when there are duplicate elements: for the reduced data > (There are x vertices with ID v), the database will map it to x * > out(v).count() (a number) instead of , which results in > inconsistency. > We noticed that there is an example of “local” provided by the ["Tinkerpop > Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], > which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can > not obtain the correct result in the provided case since _“local”_ propagates > the traverser through the internal traversal as is without splitting/cloning > it. > Nevertheless, the results of the current execution of the above statement > also contradict the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > but could be alleviated. Therefore, we suggest using a second way > of handling reduced data to alleviate this situation. > The graph is created by the following statements: > {code:java} > g.addV("Vlabel1").property("prop11", true).property("prop26", > -1.3054643785208727e+18).property("prop3", > 5955883311802481410).property("PersonalId", 1) > g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", > Double.POSITIVE_INFINITY).property("prop29", > -8.088511244487521e+18).property("prop1", > -791166414100353228).property("prop10", Double.NaN).property("prop20", > false).property("prop12", -1.611044197269977e+18).property("prop8", > Double.POSITIVE_INFINITY).property("prop28", > "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", > 122416389).property("prop4", -133008224708918302).property("prop16", > Double.POSITIVE_INFINITY).property("prop5", > 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", > 2) > g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", > false).property("prop20", true).property("prop28", "Eb").property("prop26", > Double.POSITIVE_INFINITY).property("prop19", > "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", > "emdyKI4gibcntwr9xr1R").property("prop8", > 1.6766837870245322e+18).property("prop6", > "KPvJU8zUZkDujXO5").property("prop5", > Double.POSITIVE_INFINITY).property("prop16", > Double.NEGATIVE_INFINITY).property("prop29", > -6.379213156782167e+16).property("prop9", > -2639063587618099127).property("prop2", > -4223871862589164789).property("prop7", true).property("prop22", > 3.3866441258784246e+18).property("prop12", > Double.NEGATIVE_INFINITY).property("prop15", > Double.POSITIVE_INFINITY).property("prop27", -811138702).property("prop18", > -823086061).property("prop30", 1766879986).property("prop10", > Double.NEGATIVE_INFINITY).property("prop25", true).property("prop17", > -7.221182960918364e+17).property("prop3", > 3709150069759562136).property("prop24", true).property("prop23", > 2089722858).property("prop1",
[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements
[ https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695703#comment-17695703 ] Stephen Mallette commented on TINKERPOP-2878: - I'm not sure I understand the contradiction you are referring to. I'll try to explain with another example from the "modern" graph: {code} gremlin> g.V().both().flatMap(outE().limit(1)) ==>e[10][4-created->5] ==>e[10][4-created->5] ==>e[10][4-created->5] ==>e[9][1-created->3] ==>e[9][1-created->3] ==>e[9][1-created->3] ==>e[12][6-created->3] gremlin> g.V().both().filter(outE()).count() ==>7 {code} The {{g.V().both()}} returns duplicates of the same 6 vertices in the graph. The above chooses one edge from each traverser found that has outgoing edges. you can see the count matches the number shown. If we prefer {{local()}} we get different behavior where the selection of a single edge occurs from local to each element (not traverser - i.e. no splitting) thus only one edge from the three vertices that have possible outgoing edges: {code} gremlin> g.V().both().local(outE().limit(1)) ==>e[10][4-created->5] ==>e[9][1-created->3] ==>e[12][6-created->3] {code} I think we will keep this open to try to clarify the documentation a bit, but I'm not sure I see that it's wrong as it is...probably just needs improvements and additional examples. > Incorrect handling of local operations when there are duplicate elements > > > Key: TINKERPOP-2878 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2878 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Miracy Cavendish >Priority: Major > > When using “local” to query the vertex with maximum out-degree among > vertices, there is a different result between using “dedup()” and without > “dedup()”. > {code:java} > Gremlin1: g.V().both().local(__.out().count()).max() > Result1: 280 > Gremlin2: g.V().both().dedup().local(__.out().count()).max() > Result2: 14{code} > _Result1_ should equal _Result2_ according to the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > “Local provides a execute a specified traversal on a single element within a > stream.”, whereas 280 ≠ 14. > The possible reason is that the database does not handle the bulked data > correctly when there are duplicate elements: for the reduced data > (There are x vertices with ID v), the database will map it to x * > out(v).count() (a number) instead of , which results in > inconsistency. > We noticed that there is an example of “local” provided by the ["Tinkerpop > Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], > which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can > not obtain the correct result in the provided case since _“local”_ propagates > the traverser through the internal traversal as is without splitting/cloning > it. > Nevertheless, the results of the current execution of the above statement > also contradict the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > but could be alleviated. Therefore, we suggest using a second way > of handling reduced data to alleviate this situation. > The graph is created by the following statements: > {code:java} > g.addV("Vlabel1").property("prop11", true).property("prop26", > -1.3054643785208727e+18).property("prop3", > 5955883311802481410).property("PersonalId", 1) > g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", > Double.POSITIVE_INFINITY).property("prop29", > -8.088511244487521e+18).property("prop1", > -791166414100353228).property("prop10", Double.NaN).property("prop20", > false).property("prop12", -1.611044197269977e+18).property("prop8", > Double.POSITIVE_INFINITY).property("prop28", > "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", > 122416389).property("prop4", -133008224708918302).property("prop16", > Double.POSITIVE_INFINITY).property("prop5", > 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", > 2) > g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", > false).property("prop20", true).property("prop28", "Eb").property("prop26", > Double.POSITIVE_INFINITY).property("prop19", > "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", > "emdyKI4gibcntwr9xr1R").property("prop8", > 1.6766837870245322e+18).property("prop6", > "KPvJU8zUZkDujXO5").property("prop5", > Double.POSITIVE_INFINITY).property("prop16", >
[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements
[ https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695663#comment-17695663 ] Miracy Cavendish commented on TINKERPOP-2878: - Thank you for your response. I agree that using "map" or "flatMap" would be a better choice, and the result of the "local" may be the expected behavior. However, in some contexts, we can execute the query more efficiently if we have similar operations as in the previous case (using instead of x * out(v).count() ). For instance, in my current scenario, I would like to filter out vertices that have an out-degree less than a constant but retain the duplicate vertices. Earlier, I thought I could use "local" to achieve this, because the [gremlin document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] mentions that "Local" allows executing a specified traversal on a single element within a stream. However, I would appreciate it if you could provide more detailed explanations about the bulked traversal in the document, since the documents is contradicted with the behavior > Incorrect handling of local operations when there are duplicate elements > > > Key: TINKERPOP-2878 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2878 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Miracy Cavendish >Priority: Major > > When using “local” to query the vertex with maximum out-degree among > vertices, there is a different result between using “dedup()” and without > “dedup()”. > {code:java} > Gremlin1: g.V().both().local(__.out().count()).max() > Result1: 280 > Gremlin2: g.V().both().dedup().local(__.out().count()).max() > Result2: 14{code} > _Result1_ should equal _Result2_ according to the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > “Local provides a execute a specified traversal on a single element within a > stream.”, whereas 280 ≠ 14. > The possible reason is that the database does not handle the bulked data > correctly when there are duplicate elements: for the reduced data > (There are x vertices with ID v), the database will map it to x * > out(v).count() (a number) instead of , which results in > inconsistency. > We noticed that there is an example of “local” provided by the ["Tinkerpop > Documents”|https://tinkerpop.apache.org/docs/current/reference/#local-step], > which shows the difference between _“local”_ and {_}“flatMap”’{_}, and we can > not obtain the correct result in the provided case since _“local”_ propagates > the traverser through the internal traversal as is without splitting/cloning > it. > Nevertheless, the results of the current execution of the above statement > also contradict the [gremlin > document|https://tinkerpop.apache.org/javadocs/3.4.1/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-] > but could be alleviated. Therefore, we suggest using a second way > of handling reduced data to alleviate this situation. > The graph is created by the following statements: > {code:java} > g.addV("Vlabel1").property("prop11", true).property("prop26", > -1.3054643785208727e+18).property("prop3", > 5955883311802481410).property("PersonalId", 1) > g.addV("Vlabel2").property("prop23", 1013597808).property("prop14", > Double.POSITIVE_INFINITY).property("prop29", > -8.088511244487521e+18).property("prop1", > -791166414100353228).property("prop10", Double.NaN).property("prop20", > false).property("prop12", -1.611044197269977e+18).property("prop8", > Double.POSITIVE_INFINITY).property("prop28", > "r8OwmXN0z4xVA32DuW").property("prop7", true).property("prop18", > 122416389).property("prop4", -133008224708918302).property("prop16", > Double.POSITIVE_INFINITY).property("prop5", > 2.199870305073074e+18).property("prop30", 1951661449).property("PersonalId", > 2) > g.addV("Vlabel3").property("prop13", -1833987394).property("prop11", > false).property("prop20", true).property("prop28", "Eb").property("prop26", > Double.POSITIVE_INFINITY).property("prop19", > "fkOMPiHGK4Qh9AEt").property("prop4", 7223784666736222475).property("prop21", > "emdyKI4gibcntwr9xr1R").property("prop8", > 1.6766837870245322e+18).property("prop6", > "KPvJU8zUZkDujXO5").property("prop5", > Double.POSITIVE_INFINITY).property("prop16", > Double.NEGATIVE_INFINITY).property("prop29", > -6.379213156782167e+16).property("prop9", >
[jira] [Commented] (TINKERPOP-2878) Incorrect handling of local operations when there are duplicate elements
[ https://issues.apache.org/jira/browse/TINKERPOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695181#comment-17695181 ] Stephen Mallette commented on TINKERPOP-2878: - i believe this is expected behavior. i wouldn't expect those to provide the same result. consider the "modern" graph and the first part of your traversals: {code} gremlin> g = TinkerFactory.createModern().traversal() ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard] gremlin> g.V().both().local(__.out().count()).max() ==>9 gremlin> g.V().both().dedup().local(__.out().count()).max() ==>3 {code} taking away the {{max()}} and doing a {{profile()}} yields some insights: {code} gremlin> g.V().both().local(__.out().count()).profile() ==>Traversal Metrics Step Count Traversers Time (ms)% Dur = TinkerGraphStep(vertex,[]) 6 6 0.10719.48 VertexStep(BOTH,vertex) 12 12 0.09417.15 NoOpBarrierStep(2500) 12 6 0.09116.58 LocalStep([VertexStep(OUT,edge), CountGlobalStep]) 6 6 0.25946.79 VertexStep(OUT,edge)16 6 0.041 CountGlobalStep 6 6 0.103 >TOTAL - - 0.553- gremlin> g.V().both().dedup().local(__.out().count()).profile() ==>Traversal Metrics Step Count Traversers Time (ms)% Dur = TinkerGraphStep(vertex,[]) 6 6 0.09422.82 VertexStep(BOTH,vertex) 12 12 0.07518.24 DedupGlobalStep(null,null) 6 6 0.04611.28 LocalStep([VertexStep(OUT,edge), CountGlobalStep]) 6 6 0.19647.67 VertexStep(OUT,edge) 6 6 0.039 CountGlobalStep 6 6 0.055 >TOTAL - - 0.412- {code} when you {{dedup()}} the {{local()}} step only does a {{count()}} on the edges of each of the 6 unique vertices in the graph. when you don't {{dedup()}} then {{local()}} processes 16 vertices (duplications of the 6 given traversing over {{both()}}) counting each of their edges to the same traverser, so you get the edge count multiplied by the bulk of the traverser basically. I don't think I'd use {{local()}} in this case. I'd probably prefer {{map()}} or probably {{dedup().map()}}: {code} gremlin> g.V().both().map(__.out().count()).profile() ==>Traversal Metrics Step Count Traversers Time (ms)% Dur = TinkerGraphStep(vertex,[]) 6 6 0.08422.28 VertexStep(BOTH,vertex) 12 12 0.07419.60 NoOpBarrierStep(2500) 12 6 0.07219.14 TraversalMapStep([VertexStep(OUT,edge), CountGl...12 6 0.14738.98 VertexStep(OUT,edge) 6 6 0.029 CountGlobalStep 6 6 0.036 >TOTAL - - 0.379- gremlin> g.V().both().dedup().local(__.out().count()).max() ==>3 gremlin> g.V().both().map(__.out().count()).max() ==>3 gremlin> g.V().both().dedup().map(__.out().count()).max() ==>3 {code} > Incorrect handling of local operations when there are duplicate elements > > > Key: TINKERPOP-2878 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2878 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.6.2 >