Hi, Just to check, will it be configurable, returning a ReferenceVertex or the Vertex together with its properties?
Thanks Pieter On 19/05/2016 16:39, Stephen Mallette wrote: > Ok - that's what i thought - just wanted to be certain I didn't > misunderstand you. I'm fine with that. Thanks. > > On Thu, May 19, 2016 at 10:18 AM, Dylan Millikin <dylan.milli...@gmail.com> > wrote: > >> When to switch default behavior. At this point there's no question about >> adding this feature. It's definitely a requirement. Once the feature is in >> place we can perhaps discuss it more or put it to a vote (which behavior to >> default on what version)? >> >> On Thu, May 19, 2016 at 10:15 AM, Stephen Mallette <spmalle...@gmail.com> >> wrote: >> >>> sorry - what aspect of this do you want to have "sit" for later review? >>> >>> On Thu, May 19, 2016 at 10:10 AM, Dylan Millikin < >> dylan.milli...@gmail.com >>> wrote: >>> >>>> Ah yeah the lack of consistency between OLTP and OLAP -is- an issue. >>> You're >>>> right lets let this sit and review it at a later time. >>>> >>>> >>>> On Thu, May 19, 2016 at 9:58 AM, Stephen Mallette < >> spmalle...@gmail.com> >>>> wrote: >>>> >>>>> So, first and foremost, yes we would still have both, so nothing goes >>>> away >>>>> - so that's good. >>>>> >>>>> As for the rest, I understand. When you query a document in Mongo, >> you >>>> get >>>>> the whole document for example. Most folks would expect that, coming >>> from >>>>> other systems I guess. The problem is I don't think we can go the >> other >>>>> direction to make it work consistently between olap and oltp which is >>> its >>>>> own worry. We would get the same confusion when someone issues their >>>> query >>>>> against spark and gets a ReferenceVertex rather than DetachedVertex. >>> I'm >>>>> not tied to the idea of making ReferenceVertex the default in Gremlin >>>>> Server for 3.3.x - i guess we can put that up for debate when the >> time >>>>> comes. >>>>> >>>>> On Thu, May 19, 2016 at 9:20 AM, Dylan Millikin < >>>> dylan.milli...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> I'm a little torn here. On one side it's good to have the option of >>>>>> returning a ReferenceVertex, which is currently really complicated >> to >>>> do. >>>>>> On the other hand this new behavior is far from intuitive and has >>> some >>>>>> difficultly surmountable issues. >>>>>> >>>>>> If I'm understanding you correctly both behaviors would still live >>>>> through, >>>>>> we would just switch the default mode right? I would like to debate >>>>> whether >>>>>> or not this new behavior should be default (I don't really know >>> where I >>>>>> stand but just for the sake of being thorough). >>>>>> >>>>>> Barring the actual issues this introduces (as I'm pretty sure it's >>> only >>>>>> going to concern very few people and they can use whatever conf). >>>> People >>>>>> coming from the SQL world and who already have trouble adjusting to >>>>> gremlin >>>>>> will find this counter-intuitive. After all these people couldn't >>> care >>>>> less >>>>>> about ReferenceVertex, on the other hand it's very natural to >> query a >>>>>> vertex and get it's info. Not to mention that when handling a >> vertex >>>>>> directly or using a traversal the ways of getting the properties >> are >>>>>> different and not very consistent. >>>>>> >>>>>> Again, I don't really know where I stand on this, I just wanted to >> be >>>>>> thorough. >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> On Wed, May 18, 2016 at 4:04 PM, Stephen Mallette < >>>> spmalle...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I'll try to keep this simple, as serialization tends to be >> anything >>>> but >>>>>>> simple.... >>>>>>> >>>>>>> Forgetting GraphML which has its own rules, GraphSON and Gryo are >>> the >>>>> two >>>>>>> key serialization modules that we have in IO. We use these for >>> both >>>>>>> serialization to disk as well as serialization over the network >> in >>>>>> Gremlin >>>>>>> Server. If you issue a request like: >>>>>>> >>>>>>> g.V() >>>>>>> >>>>>>> it returns vertices obviously. For both Gryo and GraphSON, those >>>>> vertices >>>>>>> are converted to DetachedVertex which includes the properties of >>> the >>>>>>> Vertex. This can be tremendously expensive, especially if the >> graph >>>>>>> supports multi-properties. >>>>>>> >>>>>>> I think that Gremlin Server should take a hint from OLAP in >>> relation >>>> to >>>>>>> this issue. With OLAP, a Vertex is converted to a ReferenceVertex >>>> where >>>>>> we >>>>>>> only get the element identifier passed around. >>>>>>> >>>>>>> gremlin> graph = >>>>> GraphFactory.open('conf/hadoop/hadoop-gryo.properties') >>>>>>> ==>hadoopgraph[gryoinputformat->gryooutputformat] >>>>>>> gremlin> g = graph.traversal().withComputer(SparkGraphComputer) >>>>>>> >>>> ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], >>>>>>> sparkgraphcomputer] >>>>>>> gremlin> l = g.V().toList();[] >>>>>>> gremlin> l[0].class >>>>>>> ==>class >>>>>>> >>> org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex >>>>>>> If you want more information, it is up to you to issue your query >>> to >>>>>>> request that information - for example: >>>>>>> >>>>>>> g.V().valueMap(true) >>>>>>> >>>>>>> I think Gremlin Server should work in the same fashion (i.e. >>> return a >>>>>>> ReferenceVertex when a Vertex is serialized over the network). >> It >>>>> would >>>>>>> ease up on serialization overhead and force users to be more >>> explicit >>>>>> about >>>>>>> the data that they want which would prevent unnecessary >> performance >>>>>>> surprises. This change might also be nice for the efficiency of >>>>>>> RemoteGraph/Connection implementations. >>>>>>> >>>>>>> This has bothered me for a while, but we carried over the pattern >>>> from >>>>>>> TinkerPop 2.x of sending back properties and I've been concerned >>>> about >>>>>>> introducing a break in trying to improve that. I dug into it >> more >>>>> today >>>>>>> and my analysis seems to indicate that this change can occur >>> without >>>>>>> breaking all the code that's currently out there. I think that we >>>> could >>>>>>> keep the existing serialization model and simply add in the >>>>>> ReferenceVertex >>>>>>> approach as a configuration option for 3.2.1 and then make it the >>>>> default >>>>>>> for 3.3.x. >>>>>>> >>>>>>> If there are no objections in the next 72 hours (Saturday, May >> 21, >>>>> 2016, >>>>>>> 4pm EST) I'll assume lazy consensus and move forward. >>>>>>>