Re: [DISCUSS] Depth First Repeat step

Keith Lohnes Thu, 19 Apr 2018 05:49:47 -0700

> whether this will affect more than just
repeat()?

For the PR that I start and the intent of this thread was to only affect
repeat.


> I prefer that the semantics of the traversal be specified in the traversal
as a first class citizen.

+1

>  I am fine with any default but am wondering whether it would be
worthwhile for the default to be overridden at a global level at not just
per traversal

It might be nice, but I guess I'm not sure what `global` means in this
context. Configured on the graph object?

On Tue, Apr 17, 2018 at 9:28 PM Daniel Kuppitz <[email protected]> wrote:

> TinkerPop makes no guarantees about the order of elements unless you
> specify an explicit order. This also goes back to the fact that certain
> strategies (LazyBarrier-, RepeatUnroll- and PathRetractionStrategy) add
> NoOpBarrierSteps to your traversal, which ultimately turns it into a
> DFS/BFS mix. Check the .explain() output of your traversal to see which
> strategy adds which steps.
>
> Cheers,
> Daniel
>
>
> On Tue, Apr 17, 2018 at 4:45 PM, Michael Pollmeier <
> [email protected]> wrote:
>
> > > Also it seems to me that DFS only really applies to repeat() with an
> > > emit().
> > > g.V().hasLabel("A").repeat().times(2) gets rewritten as
> > > g.V().hasLabel("A").out().out(). Are their subtleties that I am not
> > > aware of or does DFV vs BFS not matter in this case?
> >
> > When I read this I thought: clearly `.out().out()` is DFS for OLTP,
> > that's also what the documentation says, e.g. in this nice infographic
> > http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png
> >
> > However, looks like that's not the case. Has my life been a lie? Setting
> > up a simple flat graph to make things more obvious:
> > v3 <- v1 <- v0 -> v2 -> v4
> >
> > ```
> > graph = TinkerGraph.open()
> > v0 = graph.addVertex("l0")
> > v1 = graph.addVertex("l1")
> > v2 = graph.addVertex("l1")
> > v3 = graph.addVertex("l2")
> > v4 = graph.addVertex("l2")
> > v0.addEdge("e", v2)
> > v2.addEdge("e", v4)
> > v0.addEdge("e", v1)
> > v1.addEdge("e", v3)
> > g = graph.traversal()
> > g.V(v0).out().sideEffect{println(it)}.out().sideEffect{println(it)}
> > ```
> >
> > Prints:
> > v[2]
> > v[1]
> > v[4]
> > ==>v[4]
> > v[3]
> > ==>v[3]
> >
> > If this was OLTP the output would be:
> > v[2]
> > v[4]
> > ==>v[4]
> > v[1]
> > v[3]
> > ==>v[3]
> >
> > Cheers
> > Michael
> >
> > On 18/04/18 02:58, pieter gmail wrote:
> > > Hi,
> > >
> > > I agree with the question about whether this will affect more than just
> > > repeat()?
> > >
> > > I prefer that the semantics of the traversal be specified in the
> > > traversal as a first class citizen. i.e. with order(SearchAlgo).
> > > Strategies are to my mind internal to an implementation. In Robert's
> > > example LazyBarrierStrategy may be replaced/removed by an
> implementation
> > > for whatever internal reason they have.
> > >
> > > Regarding the default, I am fine with any default but am wondering
> > > whether it would be worthwhile for the default to be overridden at a
> > > global level at not just per traversal? That way the impact can also be
> > > alleviated when folks upgrade.
> > >
> > > Also it seems to me that DFS only really applies to repeat() with an
> > > emit().
> > > g.V().hasLabel("A").repeat().times(2) gets rewritten as
> > > g.V().hasLabel("A").out().out(). Are their subtleties that I am not
> > > aware of or does DFV vs BFS not matter in this case?
> > >
> > > Cheers
> > > Pieter
> > >
> > > On 17/04/2018 14:58, Robert Dale wrote:
> > >> +1 on DFS
> > >> -1 on order(SearchAlgo)
> > >>
> > >> It seems like a strategy may be more appropriate.  It could affect
> more
> > >> than just repeat().  And how would this interact with
> > >> LazyBarrierStrategy?
> > >>
> > >> Maybe the default should be DFS with LazyBarrierStrategy. Then
> > >> LazyBarrierStrategy
> > >> can be removed with 'withoutStrategies()' and then it works just like
> > >> everything else.  I'd prefer consistency with the way everything else
> > >> works.
> > >>
> > >>
> > >>
> > >> Robert Dale
> > >>
> > >> On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >>> Thanks for the summary. Just a quick note - I'd not worry about the
> GLV
> > >>> tests for now. That part should be easy to sort out. Let's first make
> > >>> sure
> > >>> that we get clear on the other items first before digging too deeply
> > >>> there.
> > >>>
> > >>> On an administrative front, I think that this change should just go
> to
> > >>> 3.4.0/master (so it's currently targeted to the right branch
> > >>> actually) as
> > >>> it sounds like we want DFS to be the default and that could be a
> > >>> breaking
> > >>> change as the semantics of the traversal shift a bit. So that's one
> > >>> issue
> > >>> to make sure we're all happy with.
> > >>>
> > >>> A next issue would be the API from the user perspective - thanks for
> > the
> > >>> link to that JIRA btw - I see I was involved in that conversation a
> > >>> little
> > >>> bit but then dropped off. You'd commented that you preferred a per
> loop
> > >>> configuration for search approach which is what you stuck with for
> the
> > >>> implementation. I guess I can see that there could be some value in
> > >>> having
> > >>> that ability. That said, I'm not sure that I like the overload of
> > >>> order()
> > >>> as the method for doing that:
> > >>>
> > >>> g.V().has("name", "marko").
> > >>>     repeat(__.outE().order().by("weight", decr).inV()).
> > >>>       emit().
> > >>>       order(SearchAlgo.DFS)
> > >>>
> > >>> Still thinking about what else we might do there, but perhaps that
> > >>> can wait
> > >>> a bit as I still need to study your changes in detail. Regarding:
> > >>>
> > >>>>   The barrier step that Daniel described doesn’t currently work
> since
> > >>> there’s basically booleans in the RepeatStep on whether or not to
> > >>> stash the
> > >>> starts to make the RepeatStep depth first.
> > >>>
> > >>> I assume you are referring to these booleans:
> > >>>
> > >>> https://github.com/apache/tinkerpop/blob/
> > fe8ee98f6967c7b0f0ee7f2cf2d105
> > >>> 31b68fab8b/gremlin-core/src/main/java/org/apache/
> > >>> tinkerpop/gremlin/process/traversal/step/branch/
> > RepeatStep.java#L46-L47
> > >>>
> > >>> ?
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes <[email protected]>
> > wrote:
> > >>>
> > >>>> Stephen, That’s a fair summary. I had an immediate need for it, so I
> > >>>> developed something based on Michel Pollmeier’s work and a
> > modification
> > >>> to
> > >>>> the syntax Pieter Martin suggested in the Jira
> > >>>> <https://issues.apache.org/jira/browse/TINKERPOP-1822?
> > >>>> focusedCommentId=16233681&page=com.atlassian.jira.
> > >>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681>
> > >>>> with DFS being explicit.
> > >>>>
> > >>>> The semantics I used were g.V().repeat(out()).order(DFS), putting
> the
> > >>>> order
> > >>>> clause outside of the repeat. It made it simpler for me to develop
> and
> > >>>> seemed nice to make that DFS vs BFS explicit. The barrier step that
> > >>> Daniel
> > >>>> described doesn’t currently work since there’s basically booleans in
> > >>>> the
> > >>>> RepeatStep on whether or not to stash the starts to make the
> > RepeatStep
> > >>>> depth first.
> > >>>>
> > >>>> I added a test for the DFS, though I’ve had some issues getting
> > >>>> things to
> > >>>> run re: .net tests and some other tests timing out. I have some more
> > >>> tests
> > >>>> I'd like to write based off issues that I ran into testing this in
> the
> > >>>> console (k-ary trees, until/emit before the repeat vs after), but I
> > >>> really
> > >>>> wanted to get this out there for people to take a look at and see if
> > it
> > >>>> could work out.
> > >>>> 
> > >>>>
> > >>>> On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette <
> > [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>>   Keith, I have to admit that I didn't really follow this general
> > body
> > >>> of
> > >>>>> work closely, which include this pull request, the earlier one from
> > >>>> Michael
> > >>>>> Pollmeier, the JIRA and I think some discussion somewhere on one of
> > >>>>> the
> > >>>>> mailing lists. As best I have gathered from what I did follow, I
> feel
> > >>>> like
> > >>>>> the focus of this work so far was mostly related to "can someone
> > >>>>> get it
> > >>>> to
> > >>>>> actually work", but not a lot about other aspects like testing,
> api,
> > >>>>> release branch to apply it to, etc. Is that a fair depiction of how
> > >>> this
> > >>>>> work has developed so far? if so, let's use this thread to make
> sure
> > >>>> we're
> > >>>>> all on the same page as to how this change will get in on all those
> > >>> sorts
> > >>>>> of issues.
> > >>>>>
> > >>>>> btw, thanks to you and michael for sticking at this to come to
> > >>> something
> > >>>>> that seems to work. looking forward to the continued discussion on
> > >>>>> this
> > >>>>> thread.
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier <
> > >>>>> [email protected]> wrote:
> > >>>>>
> > >>>>>> Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. My
> feeling
> > >>> is
> > >>>>>> that most users currently expect it to be DFS since that's what
> the
> > >>>> docs
> > >>>>>> say.
> > >>>>>>
> > >>>>>> And yes, it's easy to verify the default in the test suite, once
> we
> > >>>>>> agreed on what the default should be.
> > >>>>>>
> > >>>>>> Cheers
> > >>>>>> Michael
> > >>>>>>
> > >>>>>> On 17/04/18 04:40, pieter gmail wrote:
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> I have not properly followed the previous thread. However I
> thought
> > >>>> is
> > >>>>>>> going to be a way to set the default behavior as apposed to
> needing
> > >>>> to
> > >>>>>>> use barrier()
> > >>>>>>> Is this the case or not?
> > >>>>>>>
> > >>>>>>> For Sqlg at least it is possible to optimize BFS much more
> > >>>> effectively
> > >>>>>>> than DFS so it will be nice to have a way to set the strategy
> > >>> rather
> > >>>>>>> than having to manually inject barriers.
> > >>>>>>>
> > >>>>>>> Is the test suite going to enforce the BFS vs DFS?
> > >>>>>>>
> > >>>>>>> Thanks
> > >>>>>>> Pieter
> > >>>>>>>
> > >>>>>>> On 16/04/2018 16:56, Daniel Kuppitz wrote:
> > >>>>>>>> +1 for DFS. If the query relies on BFS, you can always do
> > >>>>>>>> .repeat(....barrier())...
> > >>>>>>>>
> > >>>>>>>> ^ This holds true as long as there's no significant difference
> in
> > >>>> the
> > >>>>>>>> cpu+memory consumption and overall performance of the two
> > >>>> approaches.
> > >>>>>> BFS
> > >>>>>>>> has its advantages when it comes to bulking; an arbitrary number
> > >>> of
> > >>>>>>>> traversers on the same element is processed at the same pace as
> a
> > >>>>> single
> > >>>>>>>> traverser. I don't think we can benefit from bulking in DFS.
> > >>>>>>>>
> > >>>>>>>> Cheers,
> > >>>>>>>> Daniel
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes <
> [email protected]>
> > >>>>>> wrote:
> > >>>>>>>>> As part of #838 <https://github.com/apache/tinkerpop/pull/838>
> > >>>>> there’s
> > >>>>>>>>> some
> > >>>>>>>>> discussion around whether or not to make DFS the default for
> the
> > >>>>> repeat
> > >>>>>>>>> step. On the one hand, everything else in OLTP is depth first.
> On
> > >>>> the
> > >>>>>>>>> other
> > >>>>>>>>> hand, there’s likely existing traversals that depend on the
> > >>> breadth
> > >>>>>>>>> first
> > >>>>>>>>> nature of repeat. My general preference is to make DFS optional
> > >>> at
> > >>>>>>>>> first,
> > >>>>>>>>> and at some later date, change the default and have that be a
> > >>>>> separate
> > >>>>>>>>> change from implementing DFS for repeat
> > >>>>>>>>> 
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >
> > >
> >
> >
>

Re: [DISCUSS] Depth First Repeat step

Reply via email to