Re: [DISCUSS] Returning Side Effects

Dylan Millikin Fri, 22 Jul 2016 12:18:06 -0700

Yeah sorry I left out an important part. This is especially an issue when
you're dealing with an ORM layer that's expecting results of a specific
type (for example vertices).
You can take the case of a group count as a really simple example. Your
result set could be :


[{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}]
and this is easy enough to do with gremlin. But unless this is built into
the ORM itself chances are you'll need to implement the object mapping
yourself.

The alternative is to add "count" as a property of vertex and then you can
leverage all available features from your ORM such as filtering, ordering,
etc... Actually, the way we did it above we can also do those directly in
gremlin as well.

This is a simple case, but once it gets more complicated with hierarchical
data, the option of implementing the object mapping yourself is just a
headache and often times less efficient than just rolling back a
transaction.

Dunno if that was clear enough this time around.

I'll add that I'm looking at this from a non-GLV perspective so I'm
disregarding object mapping done through GraphSONv2.0 typing in favor of a
format guarantied result set (say that either only contains vertices,
 edges, or a combination of both). The reason for this is that GLV is too
inefficient for larger projects so a more traditional script->result
approach is required.

On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette <[email protected]>
wrote:

> hi dylan, could you please provide a more concrete example of the problem
> you're facing?
>
> On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin <[email protected]>
> wrote:
>
> > I'm going to confirm that this is actually a common issue.
> > One thing to keep in mind is that often times the sideEffects are
> directly
> > linked to returned elements on a 1 --> n basis which neither of the above
> > really help with. That is to say that if you're streaming your results
> > you'll need the sideEffects that relate to the streamed element.
> >
> > There is no easy way of handling this currently. Especially if you order
> > your results and get unordered sideEffect results.
> > One way we've found to work around this is very hacky, not efficient and
> > only works for non mutating queries:
> >
> > - we start a transaction
> > - we append the sideEffect data to the elements we're emitting (say as
> > properties of a vertex)
> > - get the full result set with sideEffects as properties of the result
> > elements.
> > - rollback transaction so properties are not persisted to the graph.
> >
> > A truly wicked succession of events born from absolute desperation.
> > I enquired a while back about the ability to treat elements as detached
> > from the graph in order to do the above without the transaction handling.
> > But I never followed up.
> >
> > I figured I would put this out there as another case where non-Java
> > languages struggle.
> >
> > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette <[email protected]>
> > wrote:
> >
> > > Your way made me think that if you wrote your traversal like that, you
> > > would return the side-effects twice - once in your traversal as part of
> > the
> > > standard result and then again as a side-effect.  Not sure what that
> > means
> > > - just a thought.
> > >
> > > While I'm thinking thoughts that may or may not be obvious, it also
> > occurs
> > > to me that the downside for a GLV retrieving data that way is that the
> > > result of the traversal won't be streamed back. It will aggregate the
> > > result (and the side-effects naturally) in memory and then return that
> > all
> > > as a whole.
> > >
> > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz <[email protected]>
> > wrote:
> > >
> > > > If you really want to have your result and your side-effects returned
> > by
> > > a
> > > > single request, you could do something like this:
> > > >
> > > > gremlin>
> > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> > > > "names", "ages")*
> > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29,
> 27,
> > > 32]]
> > > > gremlin>
> > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> > > > "se").by().by(cap("names","ages"))*
> > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh],
> ages:[29,
> > > 27,
> > > > 32]]]
> > > > gremlin>
> > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> > > > "se").by().by(cap("names"))*
> > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> > > >
> > > > I'm not saying it would be bad to have Gremlin Server handle that for
> > > you,
> > > > just wanted to show that it's actually pretty easy to get the data
> and
> > > the
> > > > side-effects without using the traversal admin methods (hence it
> should
> > > > work for all GLVs).
> > > >
> > > > Cheers,
> > > > Daniel
> > > >
> > > >
> > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > As we look to build out GLVs and expand Gremlin into other
> > programming
> > > > > languages, one of the important aspects of doing this should be to
> > > > consider
> > > > > consistency across GLVs. We should try to prevent capabilities of
> > Java
> > > > from
> > > > > being lost in Python, JS, etc.
> > > > >
> > > > > As we look at both RemoteGraph in Java and gremlin-python we find
> > that
> > > > > there is no way to get traversal side-effects. If you write a
> > Traversal
> > > > and
> > > > > want side-effects from it, you have to write your traversal to
> return
> > > > them
> > > > > so that it comes back as part of the result set. Since RemoteGraph
> > and
> > > > > gremlin-python don't really allow you to directly "submit a script"
> > > it's
> > > > > not as though you can execute a traversal once for both the result
> > and
> > > > the
> > > > > side-effect and package them together in a single request as you
> > might
> > > do
> > > > > with a simple script request:
> > > > >
> > > > > $ curl -X POST -d
> > > > >
> > > > >
> > > >
> > >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > > > > http://localhost:8182
> > > > >
> > > > >
> > > >
> > >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> > > > >
> > > > > I'm thinking that we could alter things in a non-breaking way to
> > allow
> > > > > optional return of side-effect data so that there is a way to have
> > this
> > > > all
> > > > > streamed back without the need for the little workaround I just
> > > > > demonstrated. For REST I think we could just include a sideEffect
> > > request
> > > > > parameter that allowed for a list of side-effect keys to return.
> > > Perhaps
> > > > > the a "*" could indicate that all should be returned.  the
> > side-effects
> > > > > could be serialized into a key sibling to "data" called
> "sideEffect".
> > > > >
> > > > > I think a similar approach could be used for websockets and NIO
> where
> > > we
> > > > > could amend the protocol to accept that sideEffect parameter. We
> > would
> > > > > first stream results (marked with meta data to specify a "result")
> > and
> > > > then
> > > > > stream side effects (again marked with meta data as such).
> > > > >
> > > > > I considered caching the Traversal instances so that a future
> request
> > > > could
> > > > > get the side effects, but for a variety of reasons I abandoned that
> > > (the
> > > > > cache meant more heap and trying to get the right balance, new
> > > > transactions
> > > > > would have to be opened if the side-effect contained graph
> elements,
> > > > etc.)
> > > > >
> > > > > I like the approach of just maintaining our single request-response
> > > model
> > > > > with the changes I proposed above.It seems to provide the least
> > impact
> > > > with
> > > > > no new dependencies, is backward compatible and could be completely
> > > > > optional to RemoteConnections.
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Returning Side Effects

Reply via email to