Re: [DISCUSS] Depth First Repeat step

Daniel Kuppitz Tue, 17 Apr 2018 18:28:13 -0700

TinkerPop makes no guarantees about the order of elements unless you
specify an explicit order. This also goes back to the fact that certain
strategies (LazyBarrier-, RepeatUnroll- and PathRetractionStrategy) add
NoOpBarrierSteps to your traversal, which ultimately turns it into a
DFS/BFS mix. Check the .explain() output of your traversal to see which
strategy adds which steps.


Cheers,
Daniel


On Tue, Apr 17, 2018 at 4:45 PM, Michael Pollmeier <
[email protected]> wrote:

> > Also it seems to me that DFS only really applies to repeat() with an
> > emit().
> > g.V().hasLabel("A").repeat().times(2) gets rewritten as
> > g.V().hasLabel("A").out().out(). Are their subtleties that I am not
> > aware of or does DFV vs BFS not matter in this case?
>
> When I read this I thought: clearly `.out().out()` is DFS for OLTP,
> that's also what the documentation says, e.g. in this nice infographic
> http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png
>
> However, looks like that's not the case. Has my life been a lie? Setting
> up a simple flat graph to make things more obvious:
> v3 <- v1 <- v0 -> v2 -> v4
>
> ```
> graph = TinkerGraph.open()
> v0 = graph.addVertex("l0")
> v1 = graph.addVertex("l1")
> v2 = graph.addVertex("l1")
> v3 = graph.addVertex("l2")
> v4 = graph.addVertex("l2")
> v0.addEdge("e", v2)
> v2.addEdge("e", v4)
> v0.addEdge("e", v1)
> v1.addEdge("e", v3)
> g = graph.traversal()
> g.V(v0).out().sideEffect{println(it)}.out().sideEffect{println(it)}
> ```
>
> Prints:
> v[2]
> v[1]
> v[4]
> ==>v[4]
> v[3]
> ==>v[3]
>
> If this was OLTP the output would be:
> v[2]
> v[4]
> ==>v[4]
> v[1]
> v[3]
> ==>v[3]
>
> Cheers
> Michael
>
> On 18/04/18 02:58, pieter gmail wrote:
> > Hi,
> >
> > I agree with the question about whether this will affect more than just
> > repeat()?
> >
> > I prefer that the semantics of the traversal be specified in the
> > traversal as a first class citizen. i.e. with order(SearchAlgo).
> > Strategies are to my mind internal to an implementation. In Robert's
> > example LazyBarrierStrategy may be replaced/removed by an implementation
> > for whatever internal reason they have.
> >
> > Regarding the default, I am fine with any default but am wondering
> > whether it would be worthwhile for the default to be overridden at a
> > global level at not just per traversal? That way the impact can also be
> > alleviated when folks upgrade.
> >
> > Also it seems to me that DFS only really applies to repeat() with an
> > emit().
> > g.V().hasLabel("A").repeat().times(2) gets rewritten as
> > g.V().hasLabel("A").out().out(). Are their subtleties that I am not
> > aware of or does DFV vs BFS not matter in this case?
> >
> > Cheers
> > Pieter
> >
> > On 17/04/2018 14:58, Robert Dale wrote:
> >> +1 on DFS
> >> -1 on order(SearchAlgo)
> >>
> >> It seems like a strategy may be more appropriate.  It could affect more
> >> than just repeat().  And how would this interact with
> >> LazyBarrierStrategy?
> >>
> >> Maybe the default should be DFS with LazyBarrierStrategy. Then
> >> LazyBarrierStrategy
> >> can be removed with 'withoutStrategies()' and then it works just like
> >> everything else.  I'd prefer consistency with the way everything else
> >> works.
> >>
> >>
> >>
> >> Robert Dale
> >>
> >> On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette <[email protected]
> >
> >> wrote:
> >>
> >>> Thanks for the summary. Just a quick note - I'd not worry about the GLV
> >>> tests for now. That part should be easy to sort out. Let's first make
> >>> sure
> >>> that we get clear on the other items first before digging too deeply
> >>> there.
> >>>
> >>> On an administrative front, I think that this change should just go to
> >>> 3.4.0/master (so it's currently targeted to the right branch
> >>> actually) as
> >>> it sounds like we want DFS to be the default and that could be a
> >>> breaking
> >>> change as the semantics of the traversal shift a bit. So that's one
> >>> issue
> >>> to make sure we're all happy with.
> >>>
> >>> A next issue would be the API from the user perspective - thanks for
> the
> >>> link to that JIRA btw - I see I was involved in that conversation a
> >>> little
> >>> bit but then dropped off. You'd commented that you preferred a per loop
> >>> configuration for search approach which is what you stuck with for the
> >>> implementation. I guess I can see that there could be some value in
> >>> having
> >>> that ability. That said, I'm not sure that I like the overload of
> >>> order()
> >>> as the method for doing that:
> >>>
> >>> g.V().has("name", "marko").
> >>>     repeat(__.outE().order().by("weight", decr).inV()).
> >>>       emit().
> >>>       order(SearchAlgo.DFS)
> >>>
> >>> Still thinking about what else we might do there, but perhaps that
> >>> can wait
> >>> a bit as I still need to study your changes in detail. Regarding:
> >>>
> >>>>   The barrier step that Daniel described doesn’t currently work since
> >>> there’s basically booleans in the RepeatStep on whether or not to
> >>> stash the
> >>> starts to make the RepeatStep depth first.
> >>>
> >>> I assume you are referring to these booleans:
> >>>
> >>> https://github.com/apache/tinkerpop/blob/
> fe8ee98f6967c7b0f0ee7f2cf2d105
> >>> 31b68fab8b/gremlin-core/src/main/java/org/apache/
> >>> tinkerpop/gremlin/process/traversal/step/branch/
> RepeatStep.java#L46-L47
> >>>
> >>> ?
> >>>
> >>>
> >>>
> >>> On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes <[email protected]>
> wrote:
> >>>
> >>>> Stephen, That’s a fair summary. I had an immediate need for it, so I
> >>>> developed something based on Michel Pollmeier’s work and a
> modification
> >>> to
> >>>> the syntax Pieter Martin suggested in the Jira
> >>>> <https://issues.apache.org/jira/browse/TINKERPOP-1822?
> >>>> focusedCommentId=16233681&page=com.atlassian.jira.
> >>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681>
> >>>> with DFS being explicit.
> >>>>
> >>>> The semantics I used were g.V().repeat(out()).order(DFS), putting the
> >>>> order
> >>>> clause outside of the repeat. It made it simpler for me to develop and
> >>>> seemed nice to make that DFS vs BFS explicit. The barrier step that
> >>> Daniel
> >>>> described doesn’t currently work since there’s basically booleans in
> >>>> the
> >>>> RepeatStep on whether or not to stash the starts to make the
> RepeatStep
> >>>> depth first.
> >>>>
> >>>> I added a test for the DFS, though I’ve had some issues getting
> >>>> things to
> >>>> run re: .net tests and some other tests timing out. I have some more
> >>> tests
> >>>> I'd like to write based off issues that I ran into testing this in the
> >>>> console (k-ary trees, until/emit before the repeat vs after), but I
> >>> really
> >>>> wanted to get this out there for people to take a look at and see if
> it
> >>>> could work out.
> >>>> 
> >>>>
> >>>> On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette <
> [email protected]>
> >>>> wrote:
> >>>>
> >>>>>   Keith, I have to admit that I didn't really follow this general
> body
> >>> of
> >>>>> work closely, which include this pull request, the earlier one from
> >>>> Michael
> >>>>> Pollmeier, the JIRA and I think some discussion somewhere on one of
> >>>>> the
> >>>>> mailing lists. As best I have gathered from what I did follow, I feel
> >>>> like
> >>>>> the focus of this work so far was mostly related to "can someone
> >>>>> get it
> >>>> to
> >>>>> actually work", but not a lot about other aspects like testing, api,
> >>>>> release branch to apply it to, etc. Is that a fair depiction of how
> >>> this
> >>>>> work has developed so far? if so, let's use this thread to make sure
> >>>> we're
> >>>>> all on the same page as to how this change will get in on all those
> >>> sorts
> >>>>> of issues.
> >>>>>
> >>>>> btw, thanks to you and michael for sticking at this to come to
> >>> something
> >>>>> that seems to work. looking forward to the continued discussion on
> >>>>> this
> >>>>> thread.
> >>>>>
> >>>>>
> >>>>> On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. My feeling
> >>> is
> >>>>>> that most users currently expect it to be DFS since that's what the
> >>>> docs
> >>>>>> say.
> >>>>>>
> >>>>>> And yes, it's easy to verify the default in the test suite, once we
> >>>>>> agreed on what the default should be.
> >>>>>>
> >>>>>> Cheers
> >>>>>> Michael
> >>>>>>
> >>>>>> On 17/04/18 04:40, pieter gmail wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have not properly followed the previous thread. However I thought
> >>>> is
> >>>>>>> going to be a way to set the default behavior as apposed to needing
> >>>> to
> >>>>>>> use barrier()
> >>>>>>> Is this the case or not?
> >>>>>>>
> >>>>>>> For Sqlg at least it is possible to optimize BFS much more
> >>>> effectively
> >>>>>>> than DFS so it will be nice to have a way to set the strategy
> >>> rather
> >>>>>>> than having to manually inject barriers.
> >>>>>>>
> >>>>>>> Is the test suite going to enforce the BFS vs DFS?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Pieter
> >>>>>>>
> >>>>>>> On 16/04/2018 16:56, Daniel Kuppitz wrote:
> >>>>>>>> +1 for DFS. If the query relies on BFS, you can always do
> >>>>>>>> .repeat(....barrier())...
> >>>>>>>>
> >>>>>>>> ^ This holds true as long as there's no significant difference in
> >>>> the
> >>>>>>>> cpu+memory consumption and overall performance of the two
> >>>> approaches.
> >>>>>> BFS
> >>>>>>>> has its advantages when it comes to bulking; an arbitrary number
> >>> of
> >>>>>>>> traversers on the same element is processed at the same pace as a
> >>>>> single
> >>>>>>>> traverser. I don't think we can benefit from bulking in DFS.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Daniel
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes <[email protected]>
> >>>>>> wrote:
> >>>>>>>>> As part of #838 <https://github.com/apache/tinkerpop/pull/838>
> >>>>> there’s
> >>>>>>>>> some
> >>>>>>>>> discussion around whether or not to make DFS the default for the
> >>>>> repeat
> >>>>>>>>> step. On the one hand, everything else in OLTP is depth first. On
> >>>> the
> >>>>>>>>> other
> >>>>>>>>> hand, there’s likely existing traversals that depend on the
> >>> breadth
> >>>>>>>>> first
> >>>>>>>>> nature of repeat. My general preference is to make DFS optional
> >>> at
> >>>>>>>>> first,
> >>>>>>>>> and at some later date, change the default and have that be a
> >>>>> separate
> >>>>>>>>> change from implementing DFS for repeat
> >>>>>>>>> 
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >
> >
>
>

Re: [DISCUSS] Depth First Repeat step

Reply via email to