It seems like we have general agreement on the easy things, that is: 1. this is a change for 3.4.0/master and 2. we're all for a DFS option
but we still have the hard part of having to come to consensus on how it's used/implemented. The quick summary of this thread in that regard goes something like this: We currently have this PR that introduces DFS, but does so as a configuration per repeat() step. From what I gather a good many of us seem to find that approach undesirable for one or more of the following reasons: 1. The use of order() seems misplaced purely from a usability/API perspective 2. The approach seems to be at odds with how everything else works given barrier() and strategies 3. The approach seems to be at odds with our current mixed mode of DFS/BFS I think that we can see those issues resolve themselves with something Kuppitz mentioned to me: repeat() should be DFS by default where barrier() will change that behavior as required. That change would yield the following approaches: Full BFS: manually add `barrier()`'s Mixed mode: Default, let strategies do their thing OR remove strategies and manually add your own barrier() Full DFS: execute `.withoutStrategies(Lazy...)` Futherrmore, we probably should have some form of verification strategy that ensures all BFS or all DFS so that users can't get tricked along the way. It's not enough to just remove LazyBarrierStrategy to get DFS if another strategy comes along and throws in a barrier(). So if all that sounds good from a usability perspective, then we get all three modes that we want using existing traversal semantics which removes the three concerns I've summarized from this thread. We also get Keith's desire to have control over which part of a traversal is BFS/DFS if users want that capability because they can do a manual Mixed Mode and add their own barrier() to control the flow. For Pieter (or any graph provider) nothing really changes and there is opportunity to control flow with strategies as usual. I haven't really focused much on what's involved in adapting the current work in the PR to this approach as I more wanted to find the common ground among all the people who commented on the thread. If we agree that this is a nice way to go, then we can think more about "how" it could happen. Keith, I saw you mention earlier that: > The barrier step that Daniel described doesn’t currently work since there’s basically booleans in the RepeatStep on whether or not to stash the starts to make the RepeatStep depth first. I presume that would be some source of technical derailment to this approach. On Tue, Apr 24, 2018 at 3:05 PM, Keith Lohnes <[email protected]> wrote: > Yeah, that's what I meant. The steps inside are replaced with some > JanusGraph stuff. > > Cheers, > Keith > > > On Tue, Apr 24, 2018 at 1:52 PM pieter gmail <[email protected]> > wrote: > > > Nah, that looks to me like the RepeatStep survived. Just the nested > > VertexStep that got replaced with JanusgraphVertexStep. > > Good for them, first prize is not replacing anything. > > > > Cheers > > Pieter > > > > On 24/04/2018 19:50, Keith Lohnes wrote: > > > It looks like it, > > > `g.V().has("foo", "bar").repeat(out()).emit().explain()` > > > > > > yields > > > > > > `[JanusGraphStep([],[foo.eq(bar)]), > > > RepeatStep([JanusGraphVertexStep(OUT,vertex), > > > RepeatEndStep],until(false),emit(true))]` > > > > > > > > > > > > On Tue, Apr 24, 2018 at 12:12 PM pieter gmail <[email protected] > > > > > wrote: > > > > > >> Hi, > > >> > > >> Sqlg completely replaces TinkerPop's RepeatStep. The idea being that > > >> with g.V().repeat(out()).times(x) only x round trips to the db is > needed > > >> regardless of the size of the graph. Each time it will go to the db > with > > >> the full set of the previous step's incoming starts. > > >> > > >> But yeah TinkerPop's implementation is always the starting point so > I'll > > >> definitely have a look at how you have implemented DFS. > > >> > > >> BTW, does Janus graph use TinkerPop's default RepeatStep as is with no > > >> optimization strategies? > > >> > > >> Cheers > > >> Pieter > > >> > > >> On 24/04/2018 16:33, Keith Lohnes wrote: > > >>> Pieter, > > >>> > > >>> If you take a look at https://github.com/apache/tinkerpop/pull/838 > DFS > > >> is > > >>> implemented as a modification to BFS. It's taking the starts that > come > > in > > >>> from a BFS and stashing them to be processed later. I haven't seen a > > big > > >>> performance difference on JanusGraph; At least for the queries that > > I've > > >>> been running with it. I'm not terribly familiar with Sqlg, but I > wonder > > >> if > > >>> in the context of how DFS is implemented there, it may be less of a > > >>> concern. > > >>> > > >>> Cheers, > > >>> Keith > > >>> > > >>> On Thu, Apr 19, 2018 at 12:46 PM pieter gmail < > [email protected] > > > > > >>> wrote: > > >>> > > >>>> Hi, > > >>>> > > >>>> Not really sure either what 'global' means technically with respect > to > > >>>> TinkerPop's current configuration support. > > >>>> Perhaps it can be the start of a global configuration registry that > > can > > >>>> be overridden per traversal. > > >>>> > > >>>> I get that DFS is the preferred default but for Sqlg the performance > > >>>> impact is so great that I'd rather, if possible have BFS as its > > default. > > >>>> > > >>>> I am not sure about this but I reckon that any graph where the > > TinkerPop > > >>>> vm is not running in the same process space as the actual graph/data > > >>>> that latency is a big issue. BFS alleviates the latency issue > > >>>> significantly. > > >>>> > > >>>> Cheers > > >>>> Pieter > > >>>> > > >>>> > > >>>> > > >>>> On 19/04/2018 14:49, Keith Lohnes wrote: > > >>>>>> whether this will affect more than just > > >>>>> repeat()? > > >>>>> > > >>>>> For the PR that I start and the intent of this thread was to only > > >> affect > > >>>>> repeat. > > >>>>> > > >>>>>> I prefer that the semantics of the traversal be specified in the > > >>>> traversal > > >>>>> as a first class citizen. > > >>>>> > > >>>>> +1 > > >>>>> > > >>>>>> I am fine with any default but am wondering whether it would > be > > >>>>> worthwhile for the default to be overridden at a global level at > not > > >> just > > >>>>> per traversal > > >>>>> > > >>>>> It might be nice, but I guess I'm not sure what `global` means in > > this > > >>>>> context. Configured on the graph object? > > >>>>> > > >>>>> On Tue, Apr 17, 2018 at 9:28 PM Daniel Kuppitz <[email protected]> > > >> wrote: > > >>>>>> TinkerPop makes no guarantees about the order of elements unless > you > > >>>>>> specify an explicit order. This also goes back to the fact that > > >> certain > > >>>>>> strategies (LazyBarrier-, RepeatUnroll- and > PathRetractionStrategy) > > >> add > > >>>>>> NoOpBarrierSteps to your traversal, which ultimately turns it > into a > > >>>>>> DFS/BFS mix. Check the .explain() output of your traversal to see > > >> which > > >>>>>> strategy adds which steps. > > >>>>>> > > >>>>>> Cheers, > > >>>>>> Daniel > > >>>>>> > > >>>>>> > > >>>>>> On Tue, Apr 17, 2018 at 4:45 PM, Michael Pollmeier < > > >>>>>> [email protected]> wrote: > > >>>>>> > > >>>>>>>> Also it seems to me that DFS only really applies to repeat() > with > > an > > >>>>>>>> emit(). > > >>>>>>>> g.V().hasLabel("A").repeat().times(2) gets rewritten as > > >>>>>>>> g.V().hasLabel("A").out().out(). Are their subtleties that I am > > not > > >>>>>>>> aware of or does DFV vs BFS not matter in this case? > > >>>>>>> When I read this I thought: clearly `.out().out()` is DFS for > OLTP, > > >>>>>>> that's also what the documentation says, e.g. in this nice > > >> infographic > > >>>>>>> http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png > > >>>>>>> > > >>>>>>> However, looks like that's not the case. Has my life been a lie? > > >>>> Setting > > >>>>>>> up a simple flat graph to make things more obvious: > > >>>>>>> v3 <- v1 <- v0 -> v2 -> v4 > > >>>>>>> > > >>>>>>> ``` > > >>>>>>> graph = TinkerGraph.open() > > >>>>>>> v0 = graph.addVertex("l0") > > >>>>>>> v1 = graph.addVertex("l1") > > >>>>>>> v2 = graph.addVertex("l1") > > >>>>>>> v3 = graph.addVertex("l2") > > >>>>>>> v4 = graph.addVertex("l2") > > >>>>>>> v0.addEdge("e", v2) > > >>>>>>> v2.addEdge("e", v4) > > >>>>>>> v0.addEdge("e", v1) > > >>>>>>> v1.addEdge("e", v3) > > >>>>>>> g = graph.traversal() > > >>>>>>> g.V(v0).out().sideEffect{println(it)}.out().sideEffect{ > println(it)} > > >>>>>>> ``` > > >>>>>>> > > >>>>>>> Prints: > > >>>>>>> v[2] > > >>>>>>> v[1] > > >>>>>>> v[4] > > >>>>>>> ==>v[4] > > >>>>>>> v[3] > > >>>>>>> ==>v[3] > > >>>>>>> > > >>>>>>> If this was OLTP the output would be: > > >>>>>>> v[2] > > >>>>>>> v[4] > > >>>>>>> ==>v[4] > > >>>>>>> v[1] > > >>>>>>> v[3] > > >>>>>>> ==>v[3] > > >>>>>>> > > >>>>>>> Cheers > > >>>>>>> Michael > > >>>>>>> > > >>>>>>> On 18/04/18 02:58, pieter gmail wrote: > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> I agree with the question about whether this will affect more > than > > >>>> just > > >>>>>>>> repeat()? > > >>>>>>>> > > >>>>>>>> I prefer that the semantics of the traversal be specified in the > > >>>>>>>> traversal as a first class citizen. i.e. with order(SearchAlgo). > > >>>>>>>> Strategies are to my mind internal to an implementation. In > > Robert's > > >>>>>>>> example LazyBarrierStrategy may be replaced/removed by an > > >>>>>> implementation > > >>>>>>>> for whatever internal reason they have. > > >>>>>>>> > > >>>>>>>> Regarding the default, I am fine with any default but am > wondering > > >>>>>>>> whether it would be worthwhile for the default to be overridden > > at a > > >>>>>>>> global level at not just per traversal? That way the impact can > > also > > >>>> be > > >>>>>>>> alleviated when folks upgrade. > > >>>>>>>> > > >>>>>>>> Also it seems to me that DFS only really applies to repeat() > with > > an > > >>>>>>>> emit(). > > >>>>>>>> g.V().hasLabel("A").repeat().times(2) gets rewritten as > > >>>>>>>> g.V().hasLabel("A").out().out(). Are their subtleties that I am > > not > > >>>>>>>> aware of or does DFV vs BFS not matter in this case? > > >>>>>>>> > > >>>>>>>> Cheers > > >>>>>>>> Pieter > > >>>>>>>> > > >>>>>>>> On 17/04/2018 14:58, Robert Dale wrote: > > >>>>>>>>> +1 on DFS > > >>>>>>>>> -1 on order(SearchAlgo) > > >>>>>>>>> > > >>>>>>>>> It seems like a strategy may be more appropriate. It could > > affect > > >>>>>> more > > >>>>>>>>> than just repeat(). And how would this interact with > > >>>>>>>>> LazyBarrierStrategy? > > >>>>>>>>> > > >>>>>>>>> Maybe the default should be DFS with LazyBarrierStrategy. Then > > >>>>>>>>> LazyBarrierStrategy > > >>>>>>>>> can be removed with 'withoutStrategies()' and then it works > just > > >> like > > >>>>>>>>> everything else. I'd prefer consistency with the way > everything > > >> else > > >>>>>>>>> works. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Robert Dale > > >>>>>>>>> > > >>>>>>>>> On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette < > > >>>>>> [email protected] > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Thanks for the summary. Just a quick note - I'd not worry > about > > >> the > > >>>>>> GLV > > >>>>>>>>>> tests for now. That part should be easy to sort out. Let's > first > > >>>> make > > >>>>>>>>>> sure > > >>>>>>>>>> that we get clear on the other items first before digging too > > >> deeply > > >>>>>>>>>> there. > > >>>>>>>>>> > > >>>>>>>>>> On an administrative front, I think that this change should > just > > >> go > > >>>>>> to > > >>>>>>>>>> 3.4.0/master (so it's currently targeted to the right branch > > >>>>>>>>>> actually) as > > >>>>>>>>>> it sounds like we want DFS to be the default and that could > be a > > >>>>>>>>>> breaking > > >>>>>>>>>> change as the semantics of the traversal shift a bit. So > that's > > >> one > > >>>>>>>>>> issue > > >>>>>>>>>> to make sure we're all happy with. > > >>>>>>>>>> > > >>>>>>>>>> A next issue would be the API from the user perspective - > thanks > > >> for > > >>>>>>> the > > >>>>>>>>>> link to that JIRA btw - I see I was involved in that > > conversation > > >> a > > >>>>>>>>>> little > > >>>>>>>>>> bit but then dropped off. You'd commented that you preferred a > > per > > >>>>>> loop > > >>>>>>>>>> configuration for search approach which is what you stuck with > > for > > >>>>>> the > > >>>>>>>>>> implementation. I guess I can see that there could be some > value > > >> in > > >>>>>>>>>> having > > >>>>>>>>>> that ability. That said, I'm not sure that I like the overload > > of > > >>>>>>>>>> order() > > >>>>>>>>>> as the method for doing that: > > >>>>>>>>>> > > >>>>>>>>>> g.V().has("name", "marko"). > > >>>>>>>>>> repeat(__.outE().order().by("weight", decr).inV()). > > >>>>>>>>>> emit(). > > >>>>>>>>>> order(SearchAlgo.DFS) > > >>>>>>>>>> > > >>>>>>>>>> Still thinking about what else we might do there, but perhaps > > that > > >>>>>>>>>> can wait > > >>>>>>>>>> a bit as I still need to study your changes in detail. > > Regarding: > > >>>>>>>>>> > > >>>>>>>>>>> The barrier step that Daniel described doesn’t currently > > work > > >>>>>> since > > >>>>>>>>>> there’s basically booleans in the RepeatStep on whether or not > > to > > >>>>>>>>>> stash the > > >>>>>>>>>> starts to make the RepeatStep depth first. > > >>>>>>>>>> > > >>>>>>>>>> I assume you are referring to these booleans: > > >>>>>>>>>> > > >>>>>>>>>> https://github.com/apache/tinkerpop/blob/ > > >>>>>>> fe8ee98f6967c7b0f0ee7f2cf2d105 > > >>>>>>>>>> 31b68fab8b/gremlin-core/src/main/java/org/apache/ > > >>>>>>>>>> tinkerpop/gremlin/process/traversal/step/branch/ > > >>>>>>> RepeatStep.java#L46-L47 > > >>>>>>>>>> ? > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes < > > [email protected]> > > >>>>>>> wrote: > > >>>>>>>>>>> Stephen, That’s a fair summary. I had an immediate need for > it, > > >> so > > >>>> I > > >>>>>>>>>>> developed something based on Michel Pollmeier’s work and a > > >>>>>>> modification > > >>>>>>>>>> to > > >>>>>>>>>>> the syntax Pieter Martin suggested in the Jira > > >>>>>>>>>>> <https://issues.apache.org/jira/browse/TINKERPOP-1822? > > >>>>>>>>>>> focusedCommentId=16233681&page=com.atlassian.jira. > > >>>>>>>>>>> > > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681> > > >>>>>>>>>>> with DFS being explicit. > > >>>>>>>>>>> > > >>>>>>>>>>> The semantics I used were g.V().repeat(out()).order(DFS), > > putting > > >>>>>> the > > >>>>>>>>>>> order > > >>>>>>>>>>> clause outside of the repeat. It made it simpler for me to > > >> develop > > >>>>>> and > > >>>>>>>>>>> seemed nice to make that DFS vs BFS explicit. The barrier > step > > >> that > > >>>>>>>>>> Daniel > > >>>>>>>>>>> described doesn’t currently work since there’s basically > > booleans > > >>>> in > > >>>>>>>>>>> the > > >>>>>>>>>>> RepeatStep on whether or not to stash the starts to make the > > >>>>>>> RepeatStep > > >>>>>>>>>>> depth first. > > >>>>>>>>>>> > > >>>>>>>>>>> I added a test for the DFS, though I’ve had some issues > getting > > >>>>>>>>>>> things to > > >>>>>>>>>>> run re: .net tests and some other tests timing out. I have > some > > >>>> more > > >>>>>>>>>> tests > > >>>>>>>>>>> I'd like to write based off issues that I ran into testing > this > > >> in > > >>>>>> the > > >>>>>>>>>>> console (k-ary trees, until/emit before the repeat vs after), > > >> but I > > >>>>>>>>>> really > > >>>>>>>>>>> wanted to get this out there for people to take a look at and > > see > > >>>> if > > >>>>>>> it > > >>>>>>>>>>> could work out. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette < > > >>>>>>> [email protected]> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Keith, I have to admit that I didn't really follow this > > >> general > > >>>>>>> body > > >>>>>>>>>> of > > >>>>>>>>>>>> work closely, which include this pull request, the earlier > one > > >>>> from > > >>>>>>>>>>> Michael > > >>>>>>>>>>>> Pollmeier, the JIRA and I think some discussion somewhere on > > one > > >>>> of > > >>>>>>>>>>>> the > > >>>>>>>>>>>> mailing lists. As best I have gathered from what I did > > follow, I > > >>>>>> feel > > >>>>>>>>>>> like > > >>>>>>>>>>>> the focus of this work so far was mostly related to "can > > someone > > >>>>>>>>>>>> get it > > >>>>>>>>>>> to > > >>>>>>>>>>>> actually work", but not a lot about other aspects like > > testing, > > >>>>>> api, > > >>>>>>>>>>>> release branch to apply it to, etc. Is that a fair depiction > > of > > >>>> how > > >>>>>>>>>> this > > >>>>>>>>>>>> work has developed so far? if so, let's use this thread to > > make > > >>>>>> sure > > >>>>>>>>>>> we're > > >>>>>>>>>>>> all on the same page as to how this change will get in on > all > > >>>> those > > >>>>>>>>>> sorts > > >>>>>>>>>>>> of issues. > > >>>>>>>>>>>> > > >>>>>>>>>>>> btw, thanks to you and michael for sticking at this to come > to > > >>>>>>>>>> something > > >>>>>>>>>>>> that seems to work. looking forward to the continued > > discussion > > >> on > > >>>>>>>>>>>> this > > >>>>>>>>>>>> thread. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier < > > >>>>>>>>>>>> [email protected]> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. > My > > >>>>>> feeling > > >>>>>>>>>> is > > >>>>>>>>>>>>> that most users currently expect it to be DFS since that's > > what > > >>>>>> the > > >>>>>>>>>>> docs > > >>>>>>>>>>>>> say. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> And yes, it's easy to verify the default in the test suite, > > >> once > > >>>>>> we > > >>>>>>>>>>>>> agreed on what the default should be. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>> Michael > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On 17/04/18 04:40, pieter gmail wrote: > > >>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I have not properly followed the previous thread. However > I > > >>>>>> thought > > >>>>>>>>>>> is > > >>>>>>>>>>>>>> going to be a way to set the default behavior as apposed > to > > >>>>>> needing > > >>>>>>>>>>> to > > >>>>>>>>>>>>>> use barrier() > > >>>>>>>>>>>>>> Is this the case or not? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> For Sqlg at least it is possible to optimize BFS much more > > >>>>>>>>>>> effectively > > >>>>>>>>>>>>>> than DFS so it will be nice to have a way to set the > > strategy > > >>>>>>>>>> rather > > >>>>>>>>>>>>>> than having to manually inject barriers. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Is the test suite going to enforce the BFS vs DFS? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks > > >>>>>>>>>>>>>> Pieter > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On 16/04/2018 16:56, Daniel Kuppitz wrote: > > >>>>>>>>>>>>>>> +1 for DFS. If the query relies on BFS, you can always do > > >>>>>>>>>>>>>>> .repeat(....barrier())... > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> ^ This holds true as long as there's no significant > > >> difference > > >>>>>> in > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> cpu+memory consumption and overall performance of the two > > >>>>>>>>>>> approaches. > > >>>>>>>>>>>>> BFS > > >>>>>>>>>>>>>>> has its advantages when it comes to bulking; an arbitrary > > >>>> number > > >>>>>>>>>> of > > >>>>>>>>>>>>>>> traversers on the same element is processed at the same > > pace > > >> as > > >>>>>> a > > >>>>>>>>>>>> single > > >>>>>>>>>>>>>>> traverser. I don't think we can benefit from bulking in > > DFS. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>> Daniel > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes < > > >>>>>> [email protected]> > > >>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>> As part of #838 < > > >> https://github.com/apache/tinkerpop/pull/838 > > >>>>>>>>>>>> there’s > > >>>>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>> discussion around whether or not to make DFS the default > > for > > >>>>>> the > > >>>>>>>>>>>> repeat > > >>>>>>>>>>>>>>>> step. On the one hand, everything else in OLTP is depth > > >> first. > > >>>>>> On > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>>> other > > >>>>>>>>>>>>>>>> hand, there’s likely existing traversals that depend on > > the > > >>>>>>>>>> breadth > > >>>>>>>>>>>>>>>> first > > >>>>>>>>>>>>>>>> nature of repeat. My general preference is to make DFS > > >>>> optional > > >>>>>>>>>> at > > >>>>>>>>>>>>>>>> first, > > >>>>>>>>>>>>>>>> and at some later date, change the default and have that > > be > > >> a > > >>>>>>>>>>>> separate > > >>>>>>>>>>>>>>>> change from implementing DFS for repeat > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >> > > > > >
