[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805596#comment-15805596 ]
ASF GitHub Bot commented on TINKERPOP-1585: ------------------------------------------- Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95010996 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); - test(6l, g.V().repeat(__.dedup()).times(2).count()); + test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- As discussed in IM. This is actually a bug in `RepeatUnrollStrategy` that was introduced in 3.2, but doesn't exist in 3.1. Added a test case to `DedupTest` and both OLTP and OLAP now behave as expected. > OLAP dedup over non elements > ---------------------------- > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process > Affects Versions: 3.2.3 > Reporter: Daniel Kuppitz > Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)