Re: [DISCUSS] Remaining Items for 3.3.0

Marko Rodriguez Tue, 27 Jun 2017 08:18:06 -0700

Hi,

In this email I will argue why TINKERPOP-1592 is a bad idea.

GremlinServer does “too much.” In the future, the concept for GremlinServer for 
TinkerPop4 should be “network I/O to the Gremlin virtual machine.” That is it. 
So what is GremlinServer doing now that is bad? Currently GremlinServer is 
“detaching” elements from the graph and populating those elements with more 
data than the user requested. What do I mean?:

g.V(1)

Currently that returns a DetachedVertex which is vertex 1’s id, label, and all 
of its properties. Now here is the crazy part. What if there are 1M properties 
(e.g. timestamped multi-properties on sensor network vertices). Doh! However, 
what did the user ask for? They asked for v[1] and that is all they should get. 
This is known as ReferenceVertex. If they wanted some subset of the timestamped 
sensor network data, they should have done:

g.V(1).properties(‘sensor’).has(‘timestamp’,gt(2017))

Thus, we should only return the data people explicitly ask for in the 
traversal. The TINKERPOP-1592 ticket is a hack for DetachedVertex by allowing 
users to specify withDetachment(properties:[“not_sensor”]), but then it is not 
expressive enough. Ultimately, for generality, users will want to specify full 
traversals in their withDetachment() specification. Now you are talking 
SubgraphStrategy. Dar! — and guess what, GremlinServer doesn’t respect 
SubgraphStrategy. This is the problem with everything NOT being traversal — 
once you start using the “Blueprints API” you start getting inconsistent 
behavior/functionality. Thus, GremlinServer does too much — just execute the 
traversal and return the result.

Next, DetachedXXX starts to get I-N-S-A-N-E when you start talking GLVs. Now we 
have the Blueprints API implements in C#, Python, etc. Noooooooo! GLV’s should 
only implement ReferenceXXX which is the bare minimum specification of a graph 
object such that it can be re-attached (referenced back) to the source graph. 
Thats it. Don’t start populating it with properties — “what about edges?” — 
“can it get the neighboring vertices properties too?” — “what about ...?” — if 
you want that data, you traverse to it yourself!

So, what is the solution to the problem at hand — ReferenceXXX. These element 
classes are the minimal amount of data required to re-attach to the source 
graph. Thus,  if you do g.V(1), you get back id/label. However, if you want to 
then get the sensor data, you do g.V(v1).properties(…). Moreover, there is a 
little hidden gem called HaltedTraverserStrategy that allows the user to 
specify their desired element class — 
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/decoration/HaltedTraverserStrategy.java

<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/decoration/HaltedTraverserStrategy.java>.

If GremlinServer is yielding too much data, simply do this: 

g = g.withStrategy(HaltedTraverserStrategy.reference())
g.V(1) // ahh… fresh and clean.

The trick to software is not to write it. If you are a software developer, you 
are not as good as the guy who runs the deli down the street cause guess what, 
he lives just fine and doesn’t write a lick of code. 

Marko.

http://markorodriguez.com

> On Jun 26, 2017, at 2:21 PM, Stephen Mallette <[email protected]> wrote:
> 
> Looking back at this thread, I think that since there were no objections to
> doing "pre-releases" of GLVs I think we can postpone the test suite
> changes. So, i'm fine with that being off the table.
> 
> Looking at my list, I'm also surprised that I didn't include:
> 
> https://issues.apache.org/jira/browse/TINKERPOP-1592
> 
> I think that will be important for providing more flexibility to users to
> shape results returned from traversals. That, of course, is important for
> GLV remoting so that users can return only data that matters to them.
> TINKERPOP-1592 funnels into the GraphSON/Gryo 3.0 stuff mentioned
> previously as we seek to make improvements there in terms of
> efficiency/performance/usability. Marko will be taking a look at the 1592
> ticket.
> 
> I think there is a good body of nice-to-have tickets (after going through
> them all in the last couple of weeks to do some housekeeping) so we'll see
> what we can get in there and what we can't after those more crucial bits
> are done. I believe that we could start thinking about release of 3.3.0 in
> the next 4 weeks or so.
> 
> If there are any other thoughts for what's going on with 3.3.0 please let
> them be known.
> 
> 
> 
> 
> 
> 
> On Thu, Jun 1, 2017 at 2:08 PM, Stephen Mallette <[email protected]>
> wrote:
> 
>> I was just thinking about what needs to be done for 3.3.0 to get it ready
>> for release. I thought I'd send this email to collect any ideas:
>> 
>> + Dynamically load the MetricManager in Gremlin Server (TINKERPOP-1550)
>> + Clean up IO - both GraphSON 3.0 and Gryo 3.0
>> + Remove more deprecated code
>> + Test framework improvements (GLVs and in the structure/process suites)
>> 
>> I suppose these could shift and change between now and when we think it's
>> ready to release. I have no idea how much time it will take to get this all
>> done, but let's see if anyone else has any other important things for 3.3.0.
>> 
>> 
>> 
>>

Re: [DISCUSS] Remaining Items for 3.3.0

Reply via email to