It looks like it, `g.V().has("foo", "bar").repeat(out()).emit().explain()`
yields `[JanusGraphStep([],[foo.eq(bar)]), RepeatStep([JanusGraphVertexStep(OUT,vertex), RepeatEndStep],until(false),emit(true))]` On Tue, Apr 24, 2018 at 12:12 PM pieter gmail <pieter.mar...@gmail.com> wrote: > Hi, > > Sqlg completely replaces TinkerPop's RepeatStep. The idea being that > with g.V().repeat(out()).times(x) only x round trips to the db is needed > regardless of the size of the graph. Each time it will go to the db with > the full set of the previous step's incoming starts. > > But yeah TinkerPop's implementation is always the starting point so I'll > definitely have a look at how you have implemented DFS. > > BTW, does Janus graph use TinkerPop's default RepeatStep as is with no > optimization strategies? > > Cheers > Pieter > > On 24/04/2018 16:33, Keith Lohnes wrote: > > Pieter, > > > > If you take a look at https://github.com/apache/tinkerpop/pull/838 DFS > is > > implemented as a modification to BFS. It's taking the starts that come in > > from a BFS and stashing them to be processed later. I haven't seen a big > > performance difference on JanusGraph; At least for the queries that I've > > been running with it. I'm not terribly familiar with Sqlg, but I wonder > if > > in the context of how DFS is implemented there, it may be less of a > > concern. > > > > Cheers, > > Keith > > > > On Thu, Apr 19, 2018 at 12:46 PM pieter gmail <pieter.mar...@gmail.com> > > wrote: > > > >> Hi, > >> > >> Not really sure either what 'global' means technically with respect to > >> TinkerPop's current configuration support. > >> Perhaps it can be the start of a global configuration registry that can > >> be overridden per traversal. > >> > >> I get that DFS is the preferred default but for Sqlg the performance > >> impact is so great that I'd rather, if possible have BFS as its default. > >> > >> I am not sure about this but I reckon that any graph where the TinkerPop > >> vm is not running in the same process space as the actual graph/data > >> that latency is a big issue. BFS alleviates the latency issue > >> significantly. > >> > >> Cheers > >> Pieter > >> > >> > >> > >> On 19/04/2018 14:49, Keith Lohnes wrote: > >>>> whether this will affect more than just > >>> repeat()? > >>> > >>> For the PR that I start and the intent of this thread was to only > affect > >>> repeat. > >>> > >>>> I prefer that the semantics of the traversal be specified in the > >> traversal > >>> as a first class citizen. > >>> > >>> +1 > >>> > >>>> I am fine with any default but am wondering whether it would be > >>> worthwhile for the default to be overridden at a global level at not > just > >>> per traversal > >>> > >>> It might be nice, but I guess I'm not sure what `global` means in this > >>> context. Configured on the graph object? > >>> > >>> On Tue, Apr 17, 2018 at 9:28 PM Daniel Kuppitz <m...@gremlin.guru> > wrote: > >>> > >>>> TinkerPop makes no guarantees about the order of elements unless you > >>>> specify an explicit order. This also goes back to the fact that > certain > >>>> strategies (LazyBarrier-, RepeatUnroll- and PathRetractionStrategy) > add > >>>> NoOpBarrierSteps to your traversal, which ultimately turns it into a > >>>> DFS/BFS mix. Check the .explain() output of your traversal to see > which > >>>> strategy adds which steps. > >>>> > >>>> Cheers, > >>>> Daniel > >>>> > >>>> > >>>> On Tue, Apr 17, 2018 at 4:45 PM, Michael Pollmeier < > >>>> mich...@michaelpollmeier.com> wrote: > >>>> > >>>>>> Also it seems to me that DFS only really applies to repeat() with an > >>>>>> emit(). > >>>>>> g.V().hasLabel("A").repeat().times(2) gets rewritten as > >>>>>> g.V().hasLabel("A").out().out(). Are their subtleties that I am not > >>>>>> aware of or does DFV vs BFS not matter in this case? > >>>>> When I read this I thought: clearly `.out().out()` is DFS for OLTP, > >>>>> that's also what the documentation says, e.g. in this nice > infographic > >>>>> http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png > >>>>> > >>>>> However, looks like that's not the case. Has my life been a lie? > >> Setting > >>>>> up a simple flat graph to make things more obvious: > >>>>> v3 <- v1 <- v0 -> v2 -> v4 > >>>>> > >>>>> ``` > >>>>> graph = TinkerGraph.open() > >>>>> v0 = graph.addVertex("l0") > >>>>> v1 = graph.addVertex("l1") > >>>>> v2 = graph.addVertex("l1") > >>>>> v3 = graph.addVertex("l2") > >>>>> v4 = graph.addVertex("l2") > >>>>> v0.addEdge("e", v2) > >>>>> v2.addEdge("e", v4) > >>>>> v0.addEdge("e", v1) > >>>>> v1.addEdge("e", v3) > >>>>> g = graph.traversal() > >>>>> g.V(v0).out().sideEffect{println(it)}.out().sideEffect{println(it)} > >>>>> ``` > >>>>> > >>>>> Prints: > >>>>> v[2] > >>>>> v[1] > >>>>> v[4] > >>>>> ==>v[4] > >>>>> v[3] > >>>>> ==>v[3] > >>>>> > >>>>> If this was OLTP the output would be: > >>>>> v[2] > >>>>> v[4] > >>>>> ==>v[4] > >>>>> v[1] > >>>>> v[3] > >>>>> ==>v[3] > >>>>> > >>>>> Cheers > >>>>> Michael > >>>>> > >>>>> On 18/04/18 02:58, pieter gmail wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I agree with the question about whether this will affect more than > >> just > >>>>>> repeat()? > >>>>>> > >>>>>> I prefer that the semantics of the traversal be specified in the > >>>>>> traversal as a first class citizen. i.e. with order(SearchAlgo). > >>>>>> Strategies are to my mind internal to an implementation. In Robert's > >>>>>> example LazyBarrierStrategy may be replaced/removed by an > >>>> implementation > >>>>>> for whatever internal reason they have. > >>>>>> > >>>>>> Regarding the default, I am fine with any default but am wondering > >>>>>> whether it would be worthwhile for the default to be overridden at a > >>>>>> global level at not just per traversal? That way the impact can also > >> be > >>>>>> alleviated when folks upgrade. > >>>>>> > >>>>>> Also it seems to me that DFS only really applies to repeat() with an > >>>>>> emit(). > >>>>>> g.V().hasLabel("A").repeat().times(2) gets rewritten as > >>>>>> g.V().hasLabel("A").out().out(). Are their subtleties that I am not > >>>>>> aware of or does DFV vs BFS not matter in this case? > >>>>>> > >>>>>> Cheers > >>>>>> Pieter > >>>>>> > >>>>>> On 17/04/2018 14:58, Robert Dale wrote: > >>>>>>> +1 on DFS > >>>>>>> -1 on order(SearchAlgo) > >>>>>>> > >>>>>>> It seems like a strategy may be more appropriate. It could affect > >>>> more > >>>>>>> than just repeat(). And how would this interact with > >>>>>>> LazyBarrierStrategy? > >>>>>>> > >>>>>>> Maybe the default should be DFS with LazyBarrierStrategy. Then > >>>>>>> LazyBarrierStrategy > >>>>>>> can be removed with 'withoutStrategies()' and then it works just > like > >>>>>>> everything else. I'd prefer consistency with the way everything > else > >>>>>>> works. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Robert Dale > >>>>>>> > >>>>>>> On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette < > >>>> spmalle...@gmail.com > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Thanks for the summary. Just a quick note - I'd not worry about > the > >>>> GLV > >>>>>>>> tests for now. That part should be easy to sort out. Let's first > >> make > >>>>>>>> sure > >>>>>>>> that we get clear on the other items first before digging too > deeply > >>>>>>>> there. > >>>>>>>> > >>>>>>>> On an administrative front, I think that this change should just > go > >>>> to > >>>>>>>> 3.4.0/master (so it's currently targeted to the right branch > >>>>>>>> actually) as > >>>>>>>> it sounds like we want DFS to be the default and that could be a > >>>>>>>> breaking > >>>>>>>> change as the semantics of the traversal shift a bit. So that's > one > >>>>>>>> issue > >>>>>>>> to make sure we're all happy with. > >>>>>>>> > >>>>>>>> A next issue would be the API from the user perspective - thanks > for > >>>>> the > >>>>>>>> link to that JIRA btw - I see I was involved in that conversation > a > >>>>>>>> little > >>>>>>>> bit but then dropped off. You'd commented that you preferred a per > >>>> loop > >>>>>>>> configuration for search approach which is what you stuck with for > >>>> the > >>>>>>>> implementation. I guess I can see that there could be some value > in > >>>>>>>> having > >>>>>>>> that ability. That said, I'm not sure that I like the overload of > >>>>>>>> order() > >>>>>>>> as the method for doing that: > >>>>>>>> > >>>>>>>> g.V().has("name", "marko"). > >>>>>>>> repeat(__.outE().order().by("weight", decr).inV()). > >>>>>>>> emit(). > >>>>>>>> order(SearchAlgo.DFS) > >>>>>>>> > >>>>>>>> Still thinking about what else we might do there, but perhaps that > >>>>>>>> can wait > >>>>>>>> a bit as I still need to study your changes in detail. Regarding: > >>>>>>>> > >>>>>>>>> The barrier step that Daniel described doesn’t currently work > >>>> since > >>>>>>>> there’s basically booleans in the RepeatStep on whether or not to > >>>>>>>> stash the > >>>>>>>> starts to make the RepeatStep depth first. > >>>>>>>> > >>>>>>>> I assume you are referring to these booleans: > >>>>>>>> > >>>>>>>> https://github.com/apache/tinkerpop/blob/ > >>>>> fe8ee98f6967c7b0f0ee7f2cf2d105 > >>>>>>>> 31b68fab8b/gremlin-core/src/main/java/org/apache/ > >>>>>>>> tinkerpop/gremlin/process/traversal/step/branch/ > >>>>> RepeatStep.java#L46-L47 > >>>>>>>> ? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes <lohn...@gmail.com> > >>>>> wrote: > >>>>>>>>> Stephen, That’s a fair summary. I had an immediate need for it, > so > >> I > >>>>>>>>> developed something based on Michel Pollmeier’s work and a > >>>>> modification > >>>>>>>> to > >>>>>>>>> the syntax Pieter Martin suggested in the Jira > >>>>>>>>> <https://issues.apache.org/jira/browse/TINKERPOP-1822? > >>>>>>>>> focusedCommentId=16233681&page=com.atlassian.jira. > >>>>>>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681> > >>>>>>>>> with DFS being explicit. > >>>>>>>>> > >>>>>>>>> The semantics I used were g.V().repeat(out()).order(DFS), putting > >>>> the > >>>>>>>>> order > >>>>>>>>> clause outside of the repeat. It made it simpler for me to > develop > >>>> and > >>>>>>>>> seemed nice to make that DFS vs BFS explicit. The barrier step > that > >>>>>>>> Daniel > >>>>>>>>> described doesn’t currently work since there’s basically booleans > >> in > >>>>>>>>> the > >>>>>>>>> RepeatStep on whether or not to stash the starts to make the > >>>>> RepeatStep > >>>>>>>>> depth first. > >>>>>>>>> > >>>>>>>>> I added a test for the DFS, though I’ve had some issues getting > >>>>>>>>> things to > >>>>>>>>> run re: .net tests and some other tests timing out. I have some > >> more > >>>>>>>> tests > >>>>>>>>> I'd like to write based off issues that I ran into testing this > in > >>>> the > >>>>>>>>> console (k-ary trees, until/emit before the repeat vs after), > but I > >>>>>>>> really > >>>>>>>>> wanted to get this out there for people to take a look at and see > >> if > >>>>> it > >>>>>>>>> could work out. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette < > >>>>> spmalle...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Keith, I have to admit that I didn't really follow this > general > >>>>> body > >>>>>>>> of > >>>>>>>>>> work closely, which include this pull request, the earlier one > >> from > >>>>>>>>> Michael > >>>>>>>>>> Pollmeier, the JIRA and I think some discussion somewhere on one > >> of > >>>>>>>>>> the > >>>>>>>>>> mailing lists. As best I have gathered from what I did follow, I > >>>> feel > >>>>>>>>> like > >>>>>>>>>> the focus of this work so far was mostly related to "can someone > >>>>>>>>>> get it > >>>>>>>>> to > >>>>>>>>>> actually work", but not a lot about other aspects like testing, > >>>> api, > >>>>>>>>>> release branch to apply it to, etc. Is that a fair depiction of > >> how > >>>>>>>> this > >>>>>>>>>> work has developed so far? if so, let's use this thread to make > >>>> sure > >>>>>>>>> we're > >>>>>>>>>> all on the same page as to how this change will get in on all > >> those > >>>>>>>> sorts > >>>>>>>>>> of issues. > >>>>>>>>>> > >>>>>>>>>> btw, thanks to you and michael for sticking at this to come to > >>>>>>>> something > >>>>>>>>>> that seems to work. looking forward to the continued discussion > on > >>>>>>>>>> this > >>>>>>>>>> thread. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier < > >>>>>>>>>> mich...@michaelpollmeier.com> wrote: > >>>>>>>>>> > >>>>>>>>>>> Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. My > >>>> feeling > >>>>>>>> is > >>>>>>>>>>> that most users currently expect it to be DFS since that's what > >>>> the > >>>>>>>>> docs > >>>>>>>>>>> say. > >>>>>>>>>>> > >>>>>>>>>>> And yes, it's easy to verify the default in the test suite, > once > >>>> we > >>>>>>>>>>> agreed on what the default should be. > >>>>>>>>>>> > >>>>>>>>>>> Cheers > >>>>>>>>>>> Michael > >>>>>>>>>>> > >>>>>>>>>>> On 17/04/18 04:40, pieter gmail wrote: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I have not properly followed the previous thread. However I > >>>> thought > >>>>>>>>> is > >>>>>>>>>>>> going to be a way to set the default behavior as apposed to > >>>> needing > >>>>>>>>> to > >>>>>>>>>>>> use barrier() > >>>>>>>>>>>> Is this the case or not? > >>>>>>>>>>>> > >>>>>>>>>>>> For Sqlg at least it is possible to optimize BFS much more > >>>>>>>>> effectively > >>>>>>>>>>>> than DFS so it will be nice to have a way to set the strategy > >>>>>>>> rather > >>>>>>>>>>>> than having to manually inject barriers. > >>>>>>>>>>>> > >>>>>>>>>>>> Is the test suite going to enforce the BFS vs DFS? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks > >>>>>>>>>>>> Pieter > >>>>>>>>>>>> > >>>>>>>>>>>> On 16/04/2018 16:56, Daniel Kuppitz wrote: > >>>>>>>>>>>>> +1 for DFS. If the query relies on BFS, you can always do > >>>>>>>>>>>>> .repeat(....barrier())... > >>>>>>>>>>>>> > >>>>>>>>>>>>> ^ This holds true as long as there's no significant > difference > >>>> in > >>>>>>>>> the > >>>>>>>>>>>>> cpu+memory consumption and overall performance of the two > >>>>>>>>> approaches. > >>>>>>>>>>> BFS > >>>>>>>>>>>>> has its advantages when it comes to bulking; an arbitrary > >> number > >>>>>>>> of > >>>>>>>>>>>>> traversers on the same element is processed at the same pace > as > >>>> a > >>>>>>>>>> single > >>>>>>>>>>>>> traverser. I don't think we can benefit from bulking in DFS. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>> Daniel > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes < > >>>> lohn...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> As part of #838 < > https://github.com/apache/tinkerpop/pull/838 > >>>>>>>>>> there’s > >>>>>>>>>>>>>> some > >>>>>>>>>>>>>> discussion around whether or not to make DFS the default for > >>>> the > >>>>>>>>>> repeat > >>>>>>>>>>>>>> step. On the one hand, everything else in OLTP is depth > first. > >>>> On > >>>>>>>>> the > >>>>>>>>>>>>>> other > >>>>>>>>>>>>>> hand, there’s likely existing traversals that depend on the > >>>>>>>> breadth > >>>>>>>>>>>>>> first > >>>>>>>>>>>>>> nature of repeat. My general preference is to make DFS > >> optional > >>>>>>>> at > >>>>>>>>>>>>>> first, > >>>>>>>>>>>>>> and at some later date, change the default and have that be > a > >>>>>>>>>> separate > >>>>>>>>>>>>>> change from implementing DFS for repeat > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >> > >