[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844203#comment-16844203
 ] 

Daniel Choi commented on TINKERPOP-2220:
----------------------------------------

Yes if you define that's how dedup() should work inside a repeat(), then that 
is the correct behavior.  And I don't deny it's useful to have it behave this 
way.  I was just pointing out that it didn't seem to be consistent with how 
repeat works in general (repeat the inner traversal verbatim as if unrolled), 
but perhaps I'm introducing my own bias here in terms of how repeat should work.

As a counter point, imagine you wanted to do a BFS traversal starting from a 
node to all sink nodes, and wanted to print out all distinct nodes at each 
frontier depth.  In other words, all pairs (d, v), where d=depth and v=vertex. 

 
{code:java}
gremlin> 
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out()).cap("pairs")
==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]],[d:2,v:v[3]]{code}
Now with dedup:

 

 
{code:java}
gremlin> 
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs")
==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]]]
{code}
 

 

Notice how pair *[d:2, v:v[3]]* is gone in the dedup version, even though this 
is the first time it's appearing in _depth=2_.  You could argue you could 
instead do the repeat traversals without any dedup, then later dedup all the 
aggregated pairs.  But then you're introducing more unnecessary computations at 
each depth, for example if ***v[5]* and *v[3]* both had edges going out to 
*v[6]*, we'd be processing *v[6]* twice at _depth=3_, only to later dedup the 
duplicate pairs created from the double traversal.  The problem gets worse as 
your traversal tree deepens.**

> Dedup inside Repeat Produces 0 results
> --------------------------------------
>
>                 Key: TINKERPOP-2220
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 3.3.0
>            Reporter: Rahul Chander
>            Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to