Re: [DISCUSS] Returning Side Effects

Stephen Mallette Fri, 22 Jul 2016 13:43:42 -0700

> You can take the case of a group count as a really simple example.

So you want the side-effect in the Vertex itself so you can use it with the
ORM. Interesting. Perhaps nicer than doing all that trickery with
transactions would be to self-detach the vertex ahead of time (i.e. create
a DetachedVertex) and add the property you want. As indirect as that
sounds, that seems more direct to me than the "fake" transaction. Not sure
that what I'm doing here will help you with that problem.


> I'll add that I'm looking at this from a non-GLV perspective so I'm
disregarding object mapping done through GraphSONv2.0 typing in favor of a
format guarantied result set (say that either only contains vertices,
 edges, or a combination of both).

Also interesting. Not sure that kind of serialization has a place in
TinkerPop where we encourage folks to return everything under the sun by
using Gremlin to return data in a form that suits their required end
result. if this is the outcome you want, I think that my suggestion with
self-detaching is probably on the right track. Maybe consider a custom
serializer that coerces all results to a graph elements. That would take
care of all the embedded objects and the whole lot.

> The reason for this is that GLV is too
inefficient for larger projects so a more traditional script->result
approach is required.

I'm hijacking my own thread by going too deep down this path, but I think
we should strive toward a solution for GLVs to be robust enough for
developers to be successful with TinkerPop in the language of their choice.
Just like we'll never get rid of all lambdas in Gremlin, we will probably
never quite get rid of script->result for all use cases (but, again, like
lambdas the goal will be to get quite close). I find it quite interesting
that we might be able to figure out how a python dev could write Gremlin in
python that would remotely execute on the server seamlessly, however it's
also interesting that that same GLV code could be treated as server-side to
be accessed by from a python client. In that way, heavy complex logic (the
type you are talking about) could be written in python and then accessed
from python on the client. In short, i think that it would be better to
prefer to think of the work around GLVs as "how to make Gremlin good in
other languages" rather than the more narrow view of just "remoting
traversals".  If we go wider, we might come up with some good ideas to
really broaden access to TinkerPop and graphs in a very big way.

We already have a really big improvement with "remoting" as compared to
good 'ol RexsterGraph - so that's something  - haha  ;)






On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin <[email protected]>
wrote:

> Yeah sorry I left out an important part. This is especially an issue when
> you're dealing with an ORM layer that's expecting results of a specific
> type (for example vertices).
> You can take the case of a group count as a really simple example. Your
> result set could be :
>
> [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}]
> and this is easy enough to do with gremlin. But unless this is built into
> the ORM itself chances are you'll need to implement the object mapping
> yourself.
>
> The alternative is to add "count" as a property of vertex and then you can
> leverage all available features from your ORM such as filtering, ordering,
> etc... Actually, the way we did it above we can also do those directly in
> gremlin as well.
>
> This is a simple case, but once it gets more complicated with hierarchical
> data, the option of implementing the object mapping yourself is just a
> headache and often times less efficient than just rolling back a
> transaction.
>
> Dunno if that was clear enough this time around.
>
> I'll add that I'm looking at this from a non-GLV perspective so I'm
> disregarding object mapping done through GraphSONv2.0 typing in favor of a
> format guarantied result set (say that either only contains vertices,
>  edges, or a combination of both). The reason for this is that GLV is too
> inefficient for larger projects so a more traditional script->result
> approach is required.
>
> On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette <[email protected]>
> wrote:
>
> > hi dylan, could you please provide a more concrete example of the problem
> > you're facing?
> >
> > On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin <
> [email protected]>
> > wrote:
> >
> > > I'm going to confirm that this is actually a common issue.
> > > One thing to keep in mind is that often times the sideEffects are
> > directly
> > > linked to returned elements on a 1 --> n basis which neither of the
> above
> > > really help with. That is to say that if you're streaming your results
> > > you'll need the sideEffects that relate to the streamed element.
> > >
> > > There is no easy way of handling this currently. Especially if you
> order
> > > your results and get unordered sideEffect results.
> > > One way we've found to work around this is very hacky, not efficient
> and
> > > only works for non mutating queries:
> > >
> > > - we start a transaction
> > > - we append the sideEffect data to the elements we're emitting (say as
> > > properties of a vertex)
> > > - get the full result set with sideEffects as properties of the result
> > > elements.
> > > - rollback transaction so properties are not persisted to the graph.
> > >
> > > A truly wicked succession of events born from absolute desperation.
> > > I enquired a while back about the ability to treat elements as detached
> > > from the graph in order to do the above without the transaction
> handling.
> > > But I never followed up.
> > >
> > > I figured I would put this out there as another case where non-Java
> > > languages struggle.
> > >
> > > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette <
> [email protected]>
> > > wrote:
> > >
> > > > Your way made me think that if you wrote your traversal like that,
> you
> > > > would return the side-effects twice - once in your traversal as part
> of
> > > the
> > > > standard result and then again as a side-effect.  Not sure what that
> > > means
> > > > - just a thought.
> > > >
> > > > While I'm thinking thoughts that may or may not be obvious, it also
> > > occurs
> > > > to me that the downside for a GLV retrieving data that way is that
> the
> > > > result of the traversal won't be streamed back. It will aggregate the
> > > > result (and the side-effects naturally) in memory and then return
> that
> > > all
> > > > as a whole.
> > > >
> > > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz <[email protected]>
> > > wrote:
> > > >
> > > > > If you really want to have your result and your side-effects
> returned
> > > by
> > > > a
> > > > > single request, you could do something like this:
> > > > >
> > > > > gremlin>
> > > > >
> > > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> > > > > "names", "ages")*
> > > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29,
> > 27,
> > > > 32]]
> > > > > gremlin>
> > > > >
> > > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> > > > > "se").by().by(cap("names","ages"))*
> > > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh],
> > ages:[29,
> > > > 27,
> > > > > 32]]]
> > > > > gremlin>
> > > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> > > > > "se").by().by(cap("names"))*
> > > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> > > > >
> > > > > I'm not saying it would be bad to have Gremlin Server handle that
> for
> > > > you,
> > > > > just wanted to show that it's actually pretty easy to get the data
> > and
> > > > the
> > > > > side-effects without using the traversal admin methods (hence it
> > should
> > > > > work for all GLVs).
> > > > >
> > > > > Cheers,
> > > > > Daniel
> > > > >
> > > > >
> > > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > As we look to build out GLVs and expand Gremlin into other
> > > programming
> > > > > > languages, one of the important aspects of doing this should be
> to
> > > > > consider
> > > > > > consistency across GLVs. We should try to prevent capabilities of
> > > Java
> > > > > from
> > > > > > being lost in Python, JS, etc.
> > > > > >
> > > > > > As we look at both RemoteGraph in Java and gremlin-python we find
> > > that
> > > > > > there is no way to get traversal side-effects. If you write a
> > > Traversal
> > > > > and
> > > > > > want side-effects from it, you have to write your traversal to
> > return
> > > > > them
> > > > > > so that it comes back as part of the result set. Since
> RemoteGraph
> > > and
> > > > > > gremlin-python don't really allow you to directly "submit a
> script"
> > > > it's
> > > > > > not as though you can execute a traversal once for both the
> result
> > > and
> > > > > the
> > > > > > side-effect and package them together in a single request as you
> > > might
> > > > do
> > > > > > with a simple script request:
> > > > > >
> > > > > > $ curl -X POST -d
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > > > > > http://localhost:8182
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> > > > > >
> > > > > > I'm thinking that we could alter things in a non-breaking way to
> > > allow
> > > > > > optional return of side-effect data so that there is a way to
> have
> > > this
> > > > > all
> > > > > > streamed back without the need for the little workaround I just
> > > > > > demonstrated. For REST I think we could just include a sideEffect
> > > > request
> > > > > > parameter that allowed for a list of side-effect keys to return.
> > > > Perhaps
> > > > > > the a "*" could indicate that all should be returned.  the
> > > side-effects
> > > > > > could be serialized into a key sibling to "data" called
> > "sideEffect".
> > > > > >
> > > > > > I think a similar approach could be used for websockets and NIO
> > where
> > > > we
> > > > > > could amend the protocol to accept that sideEffect parameter. We
> > > would
> > > > > > first stream results (marked with meta data to specify a
> "result")
> > > and
> > > > > then
> > > > > > stream side effects (again marked with meta data as such).
> > > > > >
> > > > > > I considered caching the Traversal instances so that a future
> > request
> > > > > could
> > > > > > get the side effects, but for a variety of reasons I abandoned
> that
> > > > (the
> > > > > > cache meant more heap and trying to get the right balance, new
> > > > > transactions
> > > > > > would have to be opened if the side-effect contained graph
> > elements,
> > > > > etc.)
> > > > > >
> > > > > > I like the approach of just maintaining our single
> request-response
> > > > model
> > > > > > with the changes I proposed above.It seems to provide the least
> > > impact
> > > > > with
> > > > > > no new dependencies, is backward compatible and could be
> completely
> > > > > > optional to RemoteConnections.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Returning Side Effects

Reply via email to