[
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844203#comment-16844203
]
Daniel Choi commented on TINKERPOP-2220:
----------------------------------------
Yes if you define that's how dedup() should work inside a repeat(), then that
is the correct behavior. And I don't deny it's useful to have it behave this
way. I was just pointing out that it didn't seem to be consistent with how
repeat works in general (repeat the inner traversal verbatim as if unrolled),
but perhaps I'm introducing my own bias here in terms of how repeat should work.
As a counter point, imagine you wanted to do a BFS traversal starting from a
node to all sink nodes, and wanted to print out all distinct nodes at each
frontier depth. In other words, all pairs (d, v), where d=depth and v=vertex.
{code:java}
gremlin>
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out()).cap("pairs")
==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]],[d:2,v:v[3]]{code}
Now with dedup:
{code:java}
gremlin>
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs")
==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]]]
{code}
Notice how pair *[d:2, v:v[3]]* is gone in the dedup version, even though this
is the first time it's appearing in _depth=2_. You could argue you could
instead do the repeat traversals without any dedup, then later dedup all the
aggregated pairs. But then you're introducing more unnecessary computations at
each depth, for example if ***v[5]* and *v[3]* both had edges going out to
*v[6]*, we'd be processing *v[6]* twice at _depth=3_, only to later dedup the
duplicate pairs created from the double traversal. The problem gets worse as
your traversal tree deepens.**
> Dedup inside Repeat Produces 0 results
> --------------------------------------
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
> Issue Type: Bug
> Components: process
> Affects Versions: 3.3.0
> Reporter: Rahul Chander
> Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0
> results, while dedup twice produced the correct 6.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)