Re: [DISCUSS] Depth First Repeat step

pieter gmail Thu, 19 Apr 2018 09:46:44 -0700

Hi,

Not really sure either what 'global' means technically with respect toTinkerPop's current configuration support.Perhaps it can be the start of a global configuration registry that canbe overridden per traversal.

I get that DFS is the preferred default but for Sqlg the performanceimpact is so great that I'd rather, if possible have BFS as its default.

I am not sure about this but I reckon that any graph where the TinkerPopvm is not running in the same process space as the actual graph/datathat latency is a big issue. BFS alleviates the latency issue significantly.


Cheers
Pieter



On 19/04/2018 14:49, Keith Lohnes wrote:

whether this will affect more than just

repeat()?

For the PR that I start and the intent of this thread was to only affect
repeat.

I prefer that the semantics of the traversal be specified in the traversal

as a first class citizen.

+1

  I am fine with any default but am wondering whether it would be

worthwhile for the default to be overridden at a global level at not just
per traversal

It might be nice, but I guess I'm not sure what `global` means in this
context. Configured on the graph object?

On Tue, Apr 17, 2018 at 9:28 PM Daniel Kuppitz <[email protected]> wrote:

TinkerPop makes no guarantees about the order of elements unless you
specify an explicit order. This also goes back to the fact that certain
strategies (LazyBarrier-, RepeatUnroll- and PathRetractionStrategy) add
NoOpBarrierSteps to your traversal, which ultimately turns it into a
DFS/BFS mix. Check the .explain() output of your traversal to see which
strategy adds which steps.

Cheers,
Daniel


On Tue, Apr 17, 2018 at 4:45 PM, Michael Pollmeier <
[email protected]> wrote:

Also it seems to me that DFS only really applies to repeat() with an
emit().
g.V().hasLabel("A").repeat().times(2) gets rewritten as
g.V().hasLabel("A").out().out(). Are their subtleties that I am not
aware of or does DFV vs BFS not matter in this case?

When I read this I thought: clearly `.out().out()` is DFS for OLTP,
that's also what the documentation says, e.g. in this nice infographic
http://tinkerpop.apache.org/docs/current/images/oltp-vs-olap.png

However, looks like that's not the case. Has my life been a lie? Setting
up a simple flat graph to make things more obvious:
v3 <- v1 <- v0 -> v2 -> v4

```
graph = TinkerGraph.open()
v0 = graph.addVertex("l0")
v1 = graph.addVertex("l1")
v2 = graph.addVertex("l1")
v3 = graph.addVertex("l2")
v4 = graph.addVertex("l2")
v0.addEdge("e", v2)
v2.addEdge("e", v4)
v0.addEdge("e", v1)
v1.addEdge("e", v3)
g = graph.traversal()
g.V(v0).out().sideEffect{println(it)}.out().sideEffect{println(it)}
```

Prints:
v[2]
v[1]
v[4]
==>v[4]
v[3]
==>v[3]

If this was OLTP the output would be:
v[2]
v[4]
==>v[4]
v[1]
v[3]
==>v[3]

Cheers
Michael

On 18/04/18 02:58, pieter gmail wrote:

Hi,

I agree with the question about whether this will affect more than just
repeat()?

I prefer that the semantics of the traversal be specified in the
traversal as a first class citizen. i.e. with order(SearchAlgo).
Strategies are to my mind internal to an implementation. In Robert's
example LazyBarrierStrategy may be replaced/removed by an

implementation

for whatever internal reason they have.

Regarding the default, I am fine with any default but am wondering
whether it would be worthwhile for the default to be overridden at a
global level at not just per traversal? That way the impact can also be
alleviated when folks upgrade.

Also it seems to me that DFS only really applies to repeat() with an
emit().
g.V().hasLabel("A").repeat().times(2) gets rewritten as
g.V().hasLabel("A").out().out(). Are their subtleties that I am not
aware of or does DFV vs BFS not matter in this case?

Cheers
Pieter

On 17/04/2018 14:58, Robert Dale wrote:

+1 on DFS
-1 on order(SearchAlgo)

It seems like a strategy may be more appropriate.  It could affect

more

than just repeat().  And how would this interact with
LazyBarrierStrategy?

Maybe the default should be DFS with LazyBarrierStrategy. Then
LazyBarrierStrategy
can be removed with 'withoutStrategies()' and then it works just like
everything else.  I'd prefer consistency with the way everything else
works.



Robert Dale

On Tue, Apr 17, 2018 at 8:11 AM, Stephen Mallette <

[email protected]

wrote:

Thanks for the summary. Just a quick note - I'd not worry about the

GLV

tests for now. That part should be easy to sort out. Let's first make
sure
that we get clear on the other items first before digging too deeply
there.

On an administrative front, I think that this change should just go

to

3.4.0/master (so it's currently targeted to the right branch
actually) as
it sounds like we want DFS to be the default and that could be a
breaking
change as the semantics of the traversal shift a bit. So that's one
issue
to make sure we're all happy with.

A next issue would be the API from the user perspective - thanks for

the

link to that JIRA btw - I see I was involved in that conversation a
little
bit but then dropped off. You'd commented that you preferred a per

loop

configuration for search approach which is what you stuck with for

the

implementation. I guess I can see that there could be some value in
having
that ability. That said, I'm not sure that I like the overload of
order()
as the method for doing that:

g.V().has("name", "marko").
     repeat(__.outE().order().by("weight", decr).inV()).
       emit().
       order(SearchAlgo.DFS)

Still thinking about what else we might do there, but perhaps that
can wait
a bit as I still need to study your changes in detail. Regarding:

   The barrier step that Daniel described doesn’t currently work

since

there’s basically booleans in the RepeatStep on whether or not to
stash the
starts to make the RepeatStep depth first.

I assume you are referring to these booleans:

https://github.com/apache/tinkerpop/blob/

fe8ee98f6967c7b0f0ee7f2cf2d105

31b68fab8b/gremlin-core/src/main/java/org/apache/
tinkerpop/gremlin/process/traversal/step/branch/

RepeatStep.java#L46-L47

?



On Tue, Apr 17, 2018 at 7:37 AM, Keith Lohnes <[email protected]>

wrote:

Stephen, That’s a fair summary. I had an immediate need for it, so I
developed something based on Michel Pollmeier’s work and a

modification

to

the syntax Pieter Martin suggested in the Jira
<https://issues.apache.org/jira/browse/TINKERPOP-1822?
focusedCommentId=16233681&page=com.atlassian.jira.
plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16233681>
with DFS being explicit.

The semantics I used were g.V().repeat(out()).order(DFS), putting

the

order
clause outside of the repeat. It made it simpler for me to develop

and

seemed nice to make that DFS vs BFS explicit. The barrier step that

Daniel

described doesn’t currently work since there’s basically booleans in
the
RepeatStep on whether or not to stash the starts to make the

RepeatStep

depth first.

I added a test for the DFS, though I’ve had some issues getting
things to
run re: .net tests and some other tests timing out. I have some more

tests

I'd like to write based off issues that I ran into testing this in

the

console (k-ary trees, until/emit before the repeat vs after), but I

really

wanted to get this out there for people to take a look at and see if

it

could work out.


On Mon, Apr 16, 2018 at 7:29 PM Stephen Mallette <

[email protected]>

wrote:

   Keith, I have to admit that I didn't really follow this general

body

of

work closely, which include this pull request, the earlier one from

Michael

Pollmeier, the JIRA and I think some discussion somewhere on one of
the
mailing lists. As best I have gathered from what I did follow, I

feel

like

the focus of this work so far was mostly related to "can someone
get it

to

actually work", but not a lot about other aspects like testing,

api,

release branch to apply it to, etc. Is that a fair depiction of how

this

work has developed so far? if so, let's use this thread to make

sure

we're

all on the same page as to how this change will get in on all those

sorts

of issues.

btw, thanks to you and michael for sticking at this to come to

something

that seems to work. looking forward to the continued discussion on
this
thread.


On Mon, Apr 16, 2018 at 6:54 PM, Michael Pollmeier <
[email protected]> wrote:

Unsurprisingly I'm also +1 for defaulting to DFS in OLTP. My

feeling

is

that most users currently expect it to be DFS since that's what

the

docs

say.

And yes, it's easy to verify the default in the test suite, once

we

agreed on what the default should be.

Cheers
Michael

On 17/04/18 04:40, pieter gmail wrote:

Hi,

I have not properly followed the previous thread. However I

thought

is

going to be a way to set the default behavior as apposed to

needing

to

use barrier()
Is this the case or not?

For Sqlg at least it is possible to optimize BFS much more

effectively

than DFS so it will be nice to have a way to set the strategy

rather

than having to manually inject barriers.

Is the test suite going to enforce the BFS vs DFS?

Thanks
Pieter

On 16/04/2018 16:56, Daniel Kuppitz wrote:

+1 for DFS. If the query relies on BFS, you can always do
.repeat(....barrier())...

^ This holds true as long as there's no significant difference

in

the

cpu+memory consumption and overall performance of the two

approaches.

BFS

has its advantages when it comes to bulking; an arbitrary number

of

traversers on the same element is processed at the same pace as

single

traverser. I don't think we can benefit from bulking in DFS.

Cheers,
Daniel


On Mon, Apr 16, 2018 at 5:44 AM, Keith Lohnes <

[email protected]>

wrote:

As part of #838 <https://github.com/apache/tinkerpop/pull/838>

there’s

some
discussion around whether or not to make DFS the default for

the

repeat

step. On the one hand, everything else in OLTP is depth first.

On

the

other
hand, there’s likely existing traversals that depend on the

breadth

first
nature of repeat. My general preference is to make DFS optional

at

first,
and at some later date, change the default and have that be a

separate

change from implementing DFS for repeat

Re: [DISCUSS] Depth First Repeat step

Reply via email to