Pieter, If you take a look at https://github.com/apache/tinkerpop/pull/838 DFS is implemented as a modification to BFS. It's taking the starts that come in from a BFS and stashing them to be processed later. I haven't seen a big performance difference on JanusGraph; At least for the queries that I've been running with it. I'm not terribly familiar with Sqlg, but I wonder if in the context of how DFS is implemented there, it may be less of a concern.
Cheers, Keith On Thu, Apr 19, 2018 at 12:46 PM pieter gmail <pieter.mar...@gmail.com> wrote: > Hi, > > Not really sure either what 'global' means technically with respect to > TinkerPop's current configuration support. > Perhaps it can be the start of a global configuration registry that can > be overridden per traversal. > > I get that DFS is the preferred default but for Sqlg the performance > impact is so great that I'd rather, if possible have BFS as its default. > > I am not sure about this but I reckon that any graph where the TinkerPop > vm is not running in the same process space as the actual graph/data > that latency is a big issue. BFS alleviates the latency issue > significantly. > > Cheers > Pieter > > > > On 19/04/2018 14:49, Keith Lohnes wrote: > >> whether this will affect more than just > > repeat()? > > > > For the PR that I start and the intent of this thread was to only affect > > repeat. > > > >> I prefer that the semantics of the traversal be specified in the > traversal > > as a first class citizen. > > > > +1 > > > >> I am fine with any default but am wondering whether it would be > > worthwhile for the default to be overridden at a global level at not just > > per traversal > > > > It might be nice, but I guess I'm not sure what `global` means in this > > context. Configured on the graph object? > > > > On Tue, Apr 17, 2018 at 9:28 PM Daniel Kuppitz <m...@gremlin.guru> wrote: > > > >> TinkerPop makes no guarantees about the order of elements unless you > >> specify an explicit order. This also goes back to the fact that certain > >> strategies (LazyBarrier-, RepeatUnroll- and PathRetractionStrategy) add > >> NoOpBarrierSteps to your traversal, which ultimately turns it into a > >> DFS/BFS mix. Check the .explain() output of your traversal to see which > >> strategy adds which steps. > >> > >> Cheers, > >> Daniel > >> > >> > >> On Tue, Apr 17, 2018 at 4:45 PM, Michael Pollmeier < > >> mich...@michaelpollmeier.com> wrote: > >> > >>>> Also it seems to me that DFS only really applies to repeat() with an > >>>> emit(). > >>>> g.V().hasLabel("A").repeat().times(2) gets rewritten as > >>>> g.V().hasLabel("A").out().out(). Are their subtleties that I am not > >>>> aware of or does DFV vs BFS not matter in this case? > >>> When I read this I thought: clearly `.out().out()` is DFS for OLTP, > >>> that's also what the documentation says, e.g. in this nice infographic > >>> http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png > >>> > >>> However, looks like that's not the case. Has my life been a lie? > Setting > >>> up a simple flat graph to make things more obvious: > >>> v3 <- v1 <- v0 -> v2 -> v4 > >>> > >>> ``` > >>> graph = TinkerGraph.open() > >>> v0 = graph.addVertex("l0") > >>> v1 = graph.addVertex("l1") > >>> v2 = graph.addVertex("l1") > >>> v3 = graph.addVertex("l2") > >>> v4 = graph.addVertex("l2") > >>> v0.addEdge("e", v2) > >>> v2.addEdge("e", v4) > >>> v0.addEdge("e", v1) > >>> v1.addEdge("e", v3) > >>> g = graph.traversal() > >>> g.V(v0).out().sideEffect{println(it)}.out().sideEffect{println(it)} > >>> ``` > >>> > >>> Prints: > >>> v[2] > >>> v[1] > >>> v[4] > >>> ==>v[4] > >>> v[3] > >>> ==>v[3] > >>> > >>> If this was OLTP the output would be: > >>> v[2] > >>> v[4] > >>> ==>v[4] > >>> v[1] > >>> v[3] > >>> ==>v[3] > >>> > >>> Cheers > >>> Michael > >>> > >>> On 18/04/18 02:58, pieter gmail wrote: > >>>> Hi, > >>>> > >>>> I agree with the question about whether this will affect more than > just > >>>> repeat()? > >>>> > >>>> I prefer that the semantics of the traversal be specified in the > >>>> traversal as a first class citizen. i.e. with order(SearchAlgo). > >>>> Strategies are to my mind internal to an implementation. In Robert's > >>>> example LazyBarrierStrategy may be replaced/removed by an > >> implementation > >>>> for whatever internal reason they have. > >>>> > >>>> Regarding the default, I am fine with any default but am wondering > >>>> whether it would be worthwhile for the default to be overridden at a > >>>> global level at not just per traversal? That way the impact can also > be > >>>> alleviated when folks upgrade. > >>>> > >>>> Also it seems to me that DFS only really applies to repeat() with an > >>>> emit(). > >>>> g.V().hasLabel("A").repeat().times(2) gets rewritten as > >>>> g.V().hasLabel("A").out().out(). Are their subtleties that I am not > >>>> aware of or does DFV vs BFS not matter in this case? > >>>> > >>>> Cheers > >>>> Pieter > >>>> > >>>> On 17/04/2018 14:58, Robert Dale wrote: > >>>>> +1 on DFS > >>>>> -1 on order(SearchAlgo) > >>>>> > >>>>> It seems like a strategy may be more appropriate. It could affect > >> more > >>>>> than just repeat(). And how would this interact with > >>>>> LazyBarrierStrategy? > >>>>> > >>>>> Maybe the default should be DFS with LazyBarrierStrategy. Then > >>>>> LazyBarrierStrategy > >>>>> can be removed with 'withoutStrategies()' and then it works just like > >>>>> everything else. I'd prefer consistency with the way everything else > >>>>> works. > >>>>> > >>>>> > >>>>> > >>>>> Robert Dale > >>>>> > >>>>> On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette < > >> spmalle...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> Thanks for the summary. Just a quick note - I'd not worry about the > >> GLV > >>>>>> tests for now. That part should be easy to sort out. Let's first > make > >>>>>> sure > >>>>>> that we get clear on the other items first before digging too deeply > >>>>>> there. > >>>>>> > >>>>>> On an administrative front, I think that this change should just go > >> to > >>>>>> 3.4.0/master (so it's currently targeted to the right branch > >>>>>> actually) as > >>>>>> it sounds like we want DFS to be the default and that could be a > >>>>>> breaking > >>>>>> change as the semantics of the traversal shift a bit. So that's one > >>>>>> issue > >>>>>> to make sure we're all happy with. > >>>>>> > >>>>>> A next issue would be the API from the user perspective - thanks for > >>> the > >>>>>> link to that JIRA btw - I see I was involved in that conversation a > >>>>>> little > >>>>>> bit but then dropped off. You'd commented that you preferred a per > >> loop > >>>>>> configuration for search approach which is what you stuck with for > >> the > >>>>>> implementation. I guess I can see that there could be some value in > >>>>>> having > >>>>>> that ability. That said, I'm not sure that I like the overload of > >>>>>> order() > >>>>>> as the method for doing that: > >>>>>> > >>>>>> g.V().has("name", "marko"). > >>>>>> repeat(__.outE().order().by("weight", decr).inV()). > >>>>>> emit(). > >>>>>> order(SearchAlgo.DFS) > >>>>>> > >>>>>> Still thinking about what else we might do there, but perhaps that > >>>>>> can wait > >>>>>> a bit as I still need to study your changes in detail. Regarding: > >>>>>> > >>>>>>> The barrier step that Daniel described doesn’t currently work > >> since > >>>>>> there’s basically booleans in the RepeatStep on whether or not to > >>>>>> stash the > >>>>>> starts to make the RepeatStep depth first. > >>>>>> > >>>>>> I assume you are referring to these booleans: > >>>>>> > >>>>>> https://github.com/apache/tinkerpop/blob/ > >>> fe8ee98f6967c7b0f0ee7f2cf2d105 > >>>>>> 31b68fab8b/gremlin-core/src/main/java/org/apache/ > >>>>>> tinkerpop/gremlin/process/traversal/step/branch/ > >>> RepeatStep.java#L46-L47 > >>>>>> ? > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes <lohn...@gmail.com> > >>> wrote: > >>>>>>> Stephen, That’s a fair summary. I had an immediate need for it, so > I > >>>>>>> developed something based on Michel Pollmeier’s work and a > >>> modification > >>>>>> to > >>>>>>> the syntax Pieter Martin suggested in the Jira > >>>>>>> <https://issues.apache.org/jira/browse/TINKERPOP-1822? > >>>>>>> focusedCommentId=16233681&page=com.atlassian.jira. > >>>>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681> > >>>>>>> with DFS being explicit. > >>>>>>> > >>>>>>> The semantics I used were g.V().repeat(out()).order(DFS), putting > >> the > >>>>>>> order > >>>>>>> clause outside of the repeat. It made it simpler for me to develop > >> and > >>>>>>> seemed nice to make that DFS vs BFS explicit. The barrier step that > >>>>>> Daniel > >>>>>>> described doesn’t currently work since there’s basically booleans > in > >>>>>>> the > >>>>>>> RepeatStep on whether or not to stash the starts to make the > >>> RepeatStep > >>>>>>> depth first. > >>>>>>> > >>>>>>> I added a test for the DFS, though I’ve had some issues getting > >>>>>>> things to > >>>>>>> run re: .net tests and some other tests timing out. I have some > more > >>>>>> tests > >>>>>>> I'd like to write based off issues that I ran into testing this in > >> the > >>>>>>> console (k-ary trees, until/emit before the repeat vs after), but I > >>>>>> really > >>>>>>> wanted to get this out there for people to take a look at and see > if > >>> it > >>>>>>> could work out. > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette < > >>> spmalle...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Keith, I have to admit that I didn't really follow this general > >>> body > >>>>>> of > >>>>>>>> work closely, which include this pull request, the earlier one > from > >>>>>>> Michael > >>>>>>>> Pollmeier, the JIRA and I think some discussion somewhere on one > of > >>>>>>>> the > >>>>>>>> mailing lists. As best I have gathered from what I did follow, I > >> feel > >>>>>>> like > >>>>>>>> the focus of this work so far was mostly related to "can someone > >>>>>>>> get it > >>>>>>> to > >>>>>>>> actually work", but not a lot about other aspects like testing, > >> api, > >>>>>>>> release branch to apply it to, etc. Is that a fair depiction of > how > >>>>>> this > >>>>>>>> work has developed so far? if so, let's use this thread to make > >> sure > >>>>>>> we're > >>>>>>>> all on the same page as to how this change will get in on all > those > >>>>>> sorts > >>>>>>>> of issues. > >>>>>>>> > >>>>>>>> btw, thanks to you and michael for sticking at this to come to > >>>>>> something > >>>>>>>> that seems to work. looking forward to the continued discussion on > >>>>>>>> this > >>>>>>>> thread. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier < > >>>>>>>> mich...@michaelpollmeier.com> wrote: > >>>>>>>> > >>>>>>>>> Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. My > >> feeling > >>>>>> is > >>>>>>>>> that most users currently expect it to be DFS since that's what > >> the > >>>>>>> docs > >>>>>>>>> say. > >>>>>>>>> > >>>>>>>>> And yes, it's easy to verify the default in the test suite, once > >> we > >>>>>>>>> agreed on what the default should be. > >>>>>>>>> > >>>>>>>>> Cheers > >>>>>>>>> Michael > >>>>>>>>> > >>>>>>>>> On 17/04/18 04:40, pieter gmail wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I have not properly followed the previous thread. However I > >> thought > >>>>>>> is > >>>>>>>>>> going to be a way to set the default behavior as apposed to > >> needing > >>>>>>> to > >>>>>>>>>> use barrier() > >>>>>>>>>> Is this the case or not? > >>>>>>>>>> > >>>>>>>>>> For Sqlg at least it is possible to optimize BFS much more > >>>>>>> effectively > >>>>>>>>>> than DFS so it will be nice to have a way to set the strategy > >>>>>> rather > >>>>>>>>>> than having to manually inject barriers. > >>>>>>>>>> > >>>>>>>>>> Is the test suite going to enforce the BFS vs DFS? > >>>>>>>>>> > >>>>>>>>>> Thanks > >>>>>>>>>> Pieter > >>>>>>>>>> > >>>>>>>>>> On 16/04/2018 16:56, Daniel Kuppitz wrote: > >>>>>>>>>>> +1 for DFS. If the query relies on BFS, you can always do > >>>>>>>>>>> .repeat(....barrier())... > >>>>>>>>>>> > >>>>>>>>>>> ^ This holds true as long as there's no significant difference > >> in > >>>>>>> the > >>>>>>>>>>> cpu+memory consumption and overall performance of the two > >>>>>>> approaches. > >>>>>>>>> BFS > >>>>>>>>>>> has its advantages when it comes to bulking; an arbitrary > number > >>>>>> of > >>>>>>>>>>> traversers on the same element is processed at the same pace as > >> a > >>>>>>>> single > >>>>>>>>>>> traverser. I don't think we can benefit from bulking in DFS. > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Daniel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes < > >> lohn...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>>>> As part of #838 <https://github.com/apache/tinkerpop/pull/838 > > > >>>>>>>> there’s > >>>>>>>>>>>> some > >>>>>>>>>>>> discussion around whether or not to make DFS the default for > >> the > >>>>>>>> repeat > >>>>>>>>>>>> step. On the one hand, everything else in OLTP is depth first. > >> On > >>>>>>> the > >>>>>>>>>>>> other > >>>>>>>>>>>> hand, there’s likely existing traversals that depend on the > >>>>>> breadth > >>>>>>>>>>>> first > >>>>>>>>>>>> nature of repeat. My general preference is to make DFS > optional > >>>>>> at > >>>>>>>>>>>> first, > >>>>>>>>>>>> and at some later date, change the default and have that be a > >>>>>>>> separate > >>>>>>>>>>>> change from implementing DFS for repeat > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>> > >>> > >