Re: [DISCUSS] Depth First Repeat step

Michael Pollmeier Tue, 17 Apr 2018 16:46:44 -0700

> Also it seems to me that DFS only really applies to repeat() with an
> emit().
> g.V().hasLabel("A").repeat().times(2) gets rewritten as
> g.V().hasLabel("A").out().out(). Are their subtleties that I am not
> aware of or does DFV vs BFS not matter in this case?


When I read this I thought: clearly `.out().out()` is DFS for OLTP,
that's also what the documentation says, e.g. in this nice infographic
http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png

However, looks like that's not the case. Has my life been a lie? Setting
up a simple flat graph to make things more obvious:
v3 <- v1 <- v0 -> v2 -> v4

```
graph = TinkerGraph.open()
v0 = graph.addVertex("l0")
v1 = graph.addVertex("l1")
v2 = graph.addVertex("l1")
v3 = graph.addVertex("l2")
v4 = graph.addVertex("l2")
v0.addEdge("e", v2)
v2.addEdge("e", v4)
v0.addEdge("e", v1)
v1.addEdge("e", v3)
g = graph.traversal()
g.V(v0).out().sideEffect{println(it)}.out().sideEffect{println(it)}
```

Prints:
v[2]
v[1]
v[4]
==>v[4]
v[3]
==>v[3]

If this was OLTP the output would be:
v[2]
v[4]
==>v[4]
v[1]
v[3]
==>v[3]

Cheers
Michael

On 18/04/18 02:58, pieter gmail wrote:
> Hi,
> 
> I agree with the question about whether this will affect more than just
> repeat()?
> 
> I prefer that the semantics of the traversal be specified in the
> traversal as a first class citizen. i.e. with order(SearchAlgo).
> Strategies are to my mind internal to an implementation. In Robert's
> example LazyBarrierStrategy may be replaced/removed by an implementation
> for whatever internal reason they have.
> 
> Regarding the default, I am fine with any default but am wondering
> whether it would be worthwhile for the default to be overridden at a
> global level at not just per traversal? That way the impact can also be
> alleviated when folks upgrade.
> 
> Also it seems to me that DFS only really applies to repeat() with an
> emit().
> g.V().hasLabel("A").repeat().times(2) gets rewritten as
> g.V().hasLabel("A").out().out(). Are their subtleties that I am not
> aware of or does DFV vs BFS not matter in this case?
> 
> Cheers
> Pieter
> 
> On 17/04/2018 14:58, Robert Dale wrote:
>> +1 on DFS
>> -1 on order(SearchAlgo)
>>
>> It seems like a strategy may be more appropriate.  It could affect more
>> than just repeat().  And how would this interact with
>> LazyBarrierStrategy?
>>
>> Maybe the default should be DFS with LazyBarrierStrategy. Then
>> LazyBarrierStrategy
>> can be removed with 'withoutStrategies()' and then it works just like
>> everything else.  I'd prefer consistency with the way everything else
>> works.
>>
>>
>>
>> Robert Dale
>>
>> On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette <[email protected]>
>> wrote:
>>
>>> Thanks for the summary. Just a quick note - I'd not worry about the GLV
>>> tests for now. That part should be easy to sort out. Let's first make
>>> sure
>>> that we get clear on the other items first before digging too deeply
>>> there.
>>>
>>> On an administrative front, I think that this change should just go to
>>> 3.4.0/master (so it's currently targeted to the right branch
>>> actually) as
>>> it sounds like we want DFS to be the default and that could be a
>>> breaking
>>> change as the semantics of the traversal shift a bit. So that's one
>>> issue
>>> to make sure we're all happy with.
>>>
>>> A next issue would be the API from the user perspective - thanks for the
>>> link to that JIRA btw - I see I was involved in that conversation a
>>> little
>>> bit but then dropped off. You'd commented that you preferred a per loop
>>> configuration for search approach which is what you stuck with for the
>>> implementation. I guess I can see that there could be some value in
>>> having
>>> that ability. That said, I'm not sure that I like the overload of
>>> order()
>>> as the method for doing that:
>>>
>>> g.V().has("name", "marko").
>>>     repeat(__.outE().order().by("weight", decr).inV()).
>>>       emit().
>>>       order(SearchAlgo.DFS)
>>>
>>> Still thinking about what else we might do there, but perhaps that
>>> can wait
>>> a bit as I still need to study your changes in detail. Regarding:
>>>
>>>>   The barrier step that Daniel described doesn’t currently work since
>>> there’s basically booleans in the RepeatStep on whether or not to
>>> stash the
>>> starts to make the RepeatStep depth first.
>>>
>>> I assume you are referring to these booleans:
>>>
>>> https://github.com/apache/tinkerpop/blob/fe8ee98f6967c7b0f0ee7f2cf2d105
>>> 31b68fab8b/gremlin-core/src/main/java/org/apache/
>>> tinkerpop/gremlin/process/traversal/step/branch/RepeatStep.java#L46-L47
>>>
>>> ?
>>>
>>>
>>>
>>> On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes <[email protected]> wrote:
>>>
>>>> Stephen, That’s a fair summary. I had an immediate need for it, so I
>>>> developed something based on Michel Pollmeier’s work and a modification
>>> to
>>>> the syntax Pieter Martin suggested in the Jira
>>>> <https://issues.apache.org/jira/browse/TINKERPOP-1822?
>>>> focusedCommentId=16233681&page=com.atlassian.jira.
>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681>
>>>> with DFS being explicit.
>>>>
>>>> The semantics I used were g.V().repeat(out()).order(DFS), putting the
>>>> order
>>>> clause outside of the repeat. It made it simpler for me to develop and
>>>> seemed nice to make that DFS vs BFS explicit. The barrier step that
>>> Daniel
>>>> described doesn’t currently work since there’s basically booleans in
>>>> the
>>>> RepeatStep on whether or not to stash the starts to make the RepeatStep
>>>> depth first.
>>>>
>>>> I added a test for the DFS, though I’ve had some issues getting
>>>> things to
>>>> run re: .net tests and some other tests timing out. I have some more
>>> tests
>>>> I'd like to write based off issues that I ran into testing this in the
>>>> console (k-ary trees, until/emit before the repeat vs after), but I
>>> really
>>>> wanted to get this out there for people to take a look at and see if it
>>>> could work out.
>>>> 
>>>>
>>>> On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette <[email protected]>
>>>> wrote:
>>>>
>>>>>   Keith, I have to admit that I didn't really follow this general body
>>> of
>>>>> work closely, which include this pull request, the earlier one from
>>>> Michael
>>>>> Pollmeier, the JIRA and I think some discussion somewhere on one of
>>>>> the
>>>>> mailing lists. As best I have gathered from what I did follow, I feel
>>>> like
>>>>> the focus of this work so far was mostly related to "can someone
>>>>> get it
>>>> to
>>>>> actually work", but not a lot about other aspects like testing, api,
>>>>> release branch to apply it to, etc. Is that a fair depiction of how
>>> this
>>>>> work has developed so far? if so, let's use this thread to make sure
>>>> we're
>>>>> all on the same page as to how this change will get in on all those
>>> sorts
>>>>> of issues.
>>>>>
>>>>> btw, thanks to you and michael for sticking at this to come to
>>> something
>>>>> that seems to work. looking forward to the continued discussion on
>>>>> this
>>>>> thread.
>>>>>
>>>>>
>>>>> On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. My feeling
>>> is
>>>>>> that most users currently expect it to be DFS since that's what the
>>>> docs
>>>>>> say.
>>>>>>
>>>>>> And yes, it's easy to verify the default in the test suite, once we
>>>>>> agreed on what the default should be.
>>>>>>
>>>>>> Cheers
>>>>>> Michael
>>>>>>
>>>>>> On 17/04/18 04:40, pieter gmail wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have not properly followed the previous thread. However I thought
>>>> is
>>>>>>> going to be a way to set the default behavior as apposed to needing
>>>> to
>>>>>>> use barrier()
>>>>>>> Is this the case or not?
>>>>>>>
>>>>>>> For Sqlg at least it is possible to optimize BFS much more
>>>> effectively
>>>>>>> than DFS so it will be nice to have a way to set the strategy
>>> rather
>>>>>>> than having to manually inject barriers.
>>>>>>>
>>>>>>> Is the test suite going to enforce the BFS vs DFS?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Pieter
>>>>>>>
>>>>>>> On 16/04/2018 16:56, Daniel Kuppitz wrote:
>>>>>>>> +1 for DFS. If the query relies on BFS, you can always do
>>>>>>>> .repeat(....barrier())...
>>>>>>>>
>>>>>>>> ^ This holds true as long as there's no significant difference in
>>>> the
>>>>>>>> cpu+memory consumption and overall performance of the two
>>>> approaches.
>>>>>> BFS
>>>>>>>> has its advantages when it comes to bulking; an arbitrary number
>>> of
>>>>>>>> traversers on the same element is processed at the same pace as a
>>>>> single
>>>>>>>> traverser. I don't think we can benefit from bulking in DFS.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes <[email protected]>
>>>>>> wrote:
>>>>>>>>> As part of #838 <https://github.com/apache/tinkerpop/pull/838>
>>>>> there’s
>>>>>>>>> some
>>>>>>>>> discussion around whether or not to make DFS the default for the
>>>>> repeat
>>>>>>>>> step. On the one hand, everything else in OLTP is depth first. On
>>>> the
>>>>>>>>> other
>>>>>>>>> hand, there’s likely existing traversals that depend on the
>>> breadth
>>>>>>>>> first
>>>>>>>>> nature of repeat. My general preference is to make DFS optional
>>> at
>>>>>>>>> first,
>>>>>>>>> and at some later date, change the default and have that be a
>>>>> separate
>>>>>>>>> change from implementing DFS for repeat
>>>>>>>>> 
>>>>>>>>>
>>>>>>>
>>>>>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] Depth First Repeat step

Reply via email to