[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844203#comment-16844203 ]
Daniel Choi commented on TINKERPOP-2220: ---------------------------------------- Yes if you define that's how dedup() should work inside a repeat(), then that is the correct behavior. And I don't deny it's useful to have it behave this way. I was just pointing out that it didn't seem to be consistent with how repeat works in general (repeat the inner traversal verbatim as if unrolled), but perhaps I'm introducing my own bias here in terms of how repeat should work. As a counter point, imagine you wanted to do a BFS traversal starting from a node to all sink nodes, and wanted to print out all distinct nodes at each frontier depth. In other words, all pairs (d, v), where d=depth and v=vertex. {code:java} gremlin> g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out()).cap("pairs") ==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]],[d:2,v:v[3]]{code} Now with dedup: {code:java} gremlin> g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs") ==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]]] {code} Notice how pair *[d:2, v:v[3]]* is gone in the dedup version, even though this is the first time it's appearing in _depth=2_. You could argue you could instead do the repeat traversals without any dedup, then later dedup all the aggregated pairs. But then you're introducing more unnecessary computations at each depth, for example if ***v[5]* and *v[3]* both had edges going out to *v[6]*, we'd be processing *v[6]* twice at _depth=3_, only to later dedup the duplicate pairs created from the double traversal. The problem gets worse as your traversal tree deepens.** > Dedup inside Repeat Produces 0 results > -------------------------------------- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process > Affects Versions: 3.3.0 > Reporter: Rahul Chander > Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)