Re: [DISCUSS] Returning Side Effects

Stephen Mallette Fri, 22 Jul 2016 15:25:13 -0700

Yes, I expected to return results first and then stream the side-effects.

On Fri, Jul 22, 2016 at 5:05 PM, Dylan Millikin <[email protected]>
wrote:


> > Perhaps nicer than doing all that trickery with transactions would be to
> self-detach the vertex ahead of time
>
> This was the original idea, I never dove too deep into it as the
> sideEffects were applied mid traversal and extra filtering/SEs still had to
> occur. I wasn't sure it was actually possible and the transaction hack
> allowed me to move on.
>
> As for the GLV limitations, it's mostly going to be network overhead.
> Unfortunately one round trip with the server is costly and I know that
> we've ended up having to be creative in order to limit the round trips by
> concatenating scripts for each query. A GLV approach would need some
> careful planing and probably a multiline byteCode feature. But I digress
> that's not what this thread is about.
>
> In the spirit of GLVs returning side effects how would your original
> proposition stream over the network? Would you get all data first and then
> SE? I'm guessing you would want to stream the SEs as well.
>
> On Fri, Jul 22, 2016 at 4:42 PM, Stephen Mallette <[email protected]>
> wrote:
>
> > > You can take the case of a group count as a really simple example.
> >
> > So you want the side-effect in the Vertex itself so you can use it with
> the
> > ORM. Interesting. Perhaps nicer than doing all that trickery with
> > transactions would be to self-detach the vertex ahead of time (i.e.
> create
> > a DetachedVertex) and add the property you want. As indirect as that
> > sounds, that seems more direct to me than the "fake" transaction. Not
> sure
> > that what I'm doing here will help you with that problem.
> >
> > > I'll add that I'm looking at this from a non-GLV perspective so I'm
> > disregarding object mapping done through GraphSONv2.0 typing in favor of
> a
> > format guarantied result set (say that either only contains vertices,
> >  edges, or a combination of both).
> >
> > Also interesting. Not sure that kind of serialization has a place in
> > TinkerPop where we encourage folks to return everything under the sun by
> > using Gremlin to return data in a form that suits their required end
> > result. if this is the outcome you want, I think that my suggestion with
> > self-detaching is probably on the right track. Maybe consider a custom
> > serializer that coerces all results to a graph elements. That would take
> > care of all the embedded objects and the whole lot.
> >
> > > The reason for this is that GLV is too
> > inefficient for larger projects so a more traditional script->result
> > approach is required.
> >
> > I'm hijacking my own thread by going too deep down this path, but I think
> > we should strive toward a solution for GLVs to be robust enough for
> > developers to be successful with TinkerPop in the language of their
> choice.
> > Just like we'll never get rid of all lambdas in Gremlin, we will probably
> > never quite get rid of script->result for all use cases (but, again, like
> > lambdas the goal will be to get quite close). I find it quite interesting
> > that we might be able to figure out how a python dev could write Gremlin
> in
> > python that would remotely execute on the server seamlessly, however it's
> > also interesting that that same GLV code could be treated as server-side
> to
> > be accessed by from a python client. In that way, heavy complex logic
> (the
> > type you are talking about) could be written in python and then accessed
> > from python on the client. In short, i think that it would be better to
> > prefer to think of the work around GLVs as "how to make Gremlin good in
> > other languages" rather than the more narrow view of just "remoting
> > traversals".  If we go wider, we might come up with some good ideas to
> > really broaden access to TinkerPop and graphs in a very big way.
> >
> > We already have a really big improvement with "remoting" as compared to
> > good 'ol RexsterGraph - so that's something  - haha  ;)
> >
> >
> >
> >
> >
> >
> > On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin <
> [email protected]>
> > wrote:
> >
> > > Yeah sorry I left out an important part. This is especially an issue
> when
> > > you're dealing with an ORM layer that's expecting results of a specific
> > > type (for example vertices).
> > > You can take the case of a group count as a really simple example. Your
> > > result set could be :
> > >
> > > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1,
> vertex:v[3]}]
> > > and this is easy enough to do with gremlin. But unless this is built
> into
> > > the ORM itself chances are you'll need to implement the object mapping
> > > yourself.
> > >
> > > The alternative is to add "count" as a property of vertex and then you
> > can
> > > leverage all available features from your ORM such as filtering,
> > ordering,
> > > etc... Actually, the way we did it above we can also do those directly
> in
> > > gremlin as well.
> > >
> > > This is a simple case, but once it gets more complicated with
> > hierarchical
> > > data, the option of implementing the object mapping yourself is just a
> > > headache and often times less efficient than just rolling back a
> > > transaction.
> > >
> > > Dunno if that was clear enough this time around.
> > >
> > > I'll add that I'm looking at this from a non-GLV perspective so I'm
> > > disregarding object mapping done through GraphSONv2.0 typing in favor
> of
> > a
> > > format guarantied result set (say that either only contains vertices,
> > >  edges, or a combination of both). The reason for this is that GLV is
> too
> > > inefficient for larger projects so a more traditional script->result
> > > approach is required.
> > >
> > > On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette <
> [email protected]>
> > > wrote:
> > >
> > > > hi dylan, could you please provide a more concrete example of the
> > problem
> > > > you're facing?
> > > >
> > > > On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin <
> > > [email protected]>
> > > > wrote:
> > > >
> > > > > I'm going to confirm that this is actually a common issue.
> > > > > One thing to keep in mind is that often times the sideEffects are
> > > > directly
> > > > > linked to returned elements on a 1 --> n basis which neither of the
> > > above
> > > > > really help with. That is to say that if you're streaming your
> > results
> > > > > you'll need the sideEffects that relate to the streamed element.
> > > > >
> > > > > There is no easy way of handling this currently. Especially if you
> > > order
> > > > > your results and get unordered sideEffect results.
> > > > > One way we've found to work around this is very hacky, not
> efficient
> > > and
> > > > > only works for non mutating queries:
> > > > >
> > > > > - we start a transaction
> > > > > - we append the sideEffect data to the elements we're emitting (say
> > as
> > > > > properties of a vertex)
> > > > > - get the full result set with sideEffects as properties of the
> > result
> > > > > elements.
> > > > > - rollback transaction so properties are not persisted to the
> graph.
> > > > >
> > > > > A truly wicked succession of events born from absolute desperation.
> > > > > I enquired a while back about the ability to treat elements as
> > detached
> > > > > from the graph in order to do the above without the transaction
> > > handling.
> > > > > But I never followed up.
> > > > >
> > > > > I figured I would put this out there as another case where non-Java
> > > > > languages struggle.
> > > > >
> > > > > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Your way made me think that if you wrote your traversal like
> that,
> > > you
> > > > > > would return the side-effects twice - once in your traversal as
> > part
> > > of
> > > > > the
> > > > > > standard result and then again as a side-effect.  Not sure what
> > that
> > > > > means
> > > > > > - just a thought.
> > > > > >
> > > > > > While I'm thinking thoughts that may or may not be obvious, it
> also
> > > > > occurs
> > > > > > to me that the downside for a GLV retrieving data that way is
> that
> > > the
> > > > > > result of the traversal won't be streamed back. It will aggregate
> > the
> > > > > > result (and the side-effects naturally) in memory and then return
> > > that
> > > > > all
> > > > > > as a whole.
> > > > > >
> > > > > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz <[email protected]
> >
> > > > > wrote:
> > > > > >
> > > > > > > If you really want to have your result and your side-effects
> > > returned
> > > > > by
> > > > > > a
> > > > > > > single request, you could do something like this:
> > > > > > >
> > > > > > > gremlin>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> > > > > > > "names", "ages")*
> > > > > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh],
> > ages:[29,
> > > > 27,
> > > > > > 32]]
> > > > > > > gremlin>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> > > > > > > "se").by().by(cap("names","ages"))*
> > > > > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh],
> > > > ages:[29,
> > > > > > 27,
> > > > > > > 32]]]
> > > > > > > gremlin>
> > > > > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> > > > > > > "se").by().by(cap("names"))*
> > > > > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> > > > > > >
> > > > > > > I'm not saying it would be bad to have Gremlin Server handle
> that
> > > for
> > > > > > you,
> > > > > > > just wanted to show that it's actually pretty easy to get the
> > data
> > > > and
> > > > > > the
> > > > > > > side-effects without using the traversal admin methods (hence
> it
> > > > should
> > > > > > > work for all GLVs).
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Daniel
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette <
> > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > As we look to build out GLVs and expand Gremlin into other
> > > > > programming
> > > > > > > > languages, one of the important aspects of doing this should
> be
> > > to
> > > > > > > consider
> > > > > > > > consistency across GLVs. We should try to prevent
> capabilities
> > of
> > > > > Java
> > > > > > > from
> > > > > > > > being lost in Python, JS, etc.
> > > > > > > >
> > > > > > > > As we look at both RemoteGraph in Java and gremlin-python we
> > find
> > > > > that
> > > > > > > > there is no way to get traversal side-effects. If you write a
> > > > > Traversal
> > > > > > > and
> > > > > > > > want side-effects from it, you have to write your traversal
> to
> > > > return
> > > > > > > them
> > > > > > > > so that it comes back as part of the result set. Since
> > > RemoteGraph
> > > > > and
> > > > > > > > gremlin-python don't really allow you to directly "submit a
> > > script"
> > > > > > it's
> > > > > > > > not as though you can execute a traversal once for both the
> > > result
> > > > > and
> > > > > > > the
> > > > > > > > side-effect and package them together in a single request as
> > you
> > > > > might
> > > > > > do
> > > > > > > > with a simple script request:
> > > > > > > >
> > > > > > > > $ curl -X POST -d
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > > > > > > > http://localhost:8182
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> > > > > > > >
> > > > > > > > I'm thinking that we could alter things in a non-breaking way
> > to
> > > > > allow
> > > > > > > > optional return of side-effect data so that there is a way to
> > > have
> > > > > this
> > > > > > > all
> > > > > > > > streamed back without the need for the little workaround I
> just
> > > > > > > > demonstrated. For REST I think we could just include a
> > sideEffect
> > > > > > request
> > > > > > > > parameter that allowed for a list of side-effect keys to
> > return.
> > > > > > Perhaps
> > > > > > > > the a "*" could indicate that all should be returned.  the
> > > > > side-effects
> > > > > > > > could be serialized into a key sibling to "data" called
> > > > "sideEffect".
> > > > > > > >
> > > > > > > > I think a similar approach could be used for websockets and
> NIO
> > > > where
> > > > > > we
> > > > > > > > could amend the protocol to accept that sideEffect parameter.
> > We
> > > > > would
> > > > > > > > first stream results (marked with meta data to specify a
> > > "result")
> > > > > and
> > > > > > > then
> > > > > > > > stream side effects (again marked with meta data as such).
> > > > > > > >
> > > > > > > > I considered caching the Traversal instances so that a future
> > > > request
> > > > > > > could
> > > > > > > > get the side effects, but for a variety of reasons I
> abandoned
> > > that
> > > > > > (the
> > > > > > > > cache meant more heap and trying to get the right balance,
> new
> > > > > > > transactions
> > > > > > > > would have to be opened if the side-effect contained graph
> > > > elements,
> > > > > > > etc.)
> > > > > > > >
> > > > > > > > I like the approach of just maintaining our single
> > > request-response
> > > > > > model
> > > > > > > > with the changes I proposed above.It seems to provide the
> least
> > > > > impact
> > > > > > > with
> > > > > > > > no new dependencies, is backward compatible and could be
> > > completely
> > > > > > > > optional to RemoteConnections.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Returning Side Effects

Reply via email to