Hi all,

RepeatUnrollStrategy is designed as an optimization strategy to improve
performance by unrolling repeat().times(x) traversals into repeating
sequences. However, the current implementation is too aggressive and
changes the output semantics of traversals, violating the fundamental
principle that optimization strategies should preserve traversal behavior
while only improving performance.

The strategy currently transforms `g.inject(5, 3,
7).repeat(order()).times(1)` into `g.inject(5, 3, 7).order()`. But,

gremlin> g.inject(5, 3, 7).repeat(order()).times(1)
==>3
==>5
==>7

gremlin> g.withoutStrategies(RepeatUnrollStrategy).inject(5, 3,
7).repeat(order()).times(1)
==>5
==>3
==>7

shows the original and unrolled queries aren't equivalent. An example
(using modern graph) with a barrier step that shows different results is:

gremlin> g.V(1).repeat(both().order().tail(2)).times(2)
==>v[5]
==>v[6]

gremlin>
g.withoutStrategies(RepeatUnrollStrategy).V(1).repeat(both().order().tail(2)).times(2)
==>v[4]
==>v[6]

The current approach tries to apply this strategy as broadly as possible.
But it is also reactive - we discover semantic violations and add more
invalidations like for dedup() and emit(), rather than being conservative
from the start.

Starting in 3.8, I'm suggesting we reduce RepeatUnrollStrategy to only
handle the most basic, provably safe cases (likely some in(), out()
queries) which provides the following benefits:

1. Guaranteed semantic preservation: conservative approach ensures the
strategy never changes traversal output
2. Reduced maintenance burden: Fewer edge cases and semantic violations to
discover and fix
3. Foundation for growth: We can gradually add back optimizations with
confidence as we prove their safety

Does anyone have any concerns with this change?

Thanks,
Ken

Reply via email to