Hello,

Throughout our documentation we show uses of the “Blueprints API” (i.e. 
Graph/Vertex/Edge/etc. classes & methods) as well as the use of the Traversal 
API (i.e. Gremlin).

Enabling users to have two ways of interacting with the graph system has its 
problems:

        1. The DetachedXXX problem — how much data should a returned 
vertex/edge/etc. have associated with it?
        2. graph.addVertex() and g.addV() — which should I use? The first is 
faster but is not recommended.
        3. SubgraphStrategy leaking — I get subgraphs with Gremlin, but can 
then directly interact with the vertex objects to see more than I should.
        4. VertexProgram model — I write traversals with Traversal API, but 
then develop VertexPrograms with the Blueprints API. That’s weird.
        5. GremlinServer returning fat objects — Serializers are created 
property-rich vertices and edges. The awkward HaltedTraversalStrategy solution.
        6. … various permutations of these source problems.

I propose that we solve this problem once and for all in TinkerPop4 as follows:

There should be two “Graph APIs.”

        1. Provider Graph API: This is the current Blueprints API with 
Graph.addVertex(), Vertex.edges(), Edge.inVertex(), etc.
        3. User Graph API: This is a ReferenceXXX API.

Lets talk about the second as its more novel and distinct from current 
practices.

We should have ReferenceGraph which is simply a reference/dummy/proxy to the 
provider Graph API. ReferenceGraph has the following API:

ReferenceGraph.open()
ReferenceGraph.close()
ReferenceGraph.tx() // assuming we like the current transaction model (??)
ReferenceGraph.traversal()

That is it. What does this entail? Assume the following traversal:

g = ReferenceGraph.open(config).traversal()
g.V(1).out(‘knows’)

ReferenceGraph is almost like a “RemoteGraph” (RemoteStrategy) in that it makes 
a connection (remote or inter-JVM) to the provider Graph API. When 
g.V(1).out(‘knows’) executes, it is really sending the bytecode to the provider 
Graph for execution (as specified by the config of ReferenceGraph.open()). 
Thus, once it hits the provider's graph, ProviderVertex, ProviderEdge, etc. are 
the objects being processed. However, what the traversal’s Iterator<Vertex> 
returns is ReferenceVertex! That is, it never returns ProviderVertex. In this 
way, regardless if the user is going “over the wire” or within the same JVM or 
against a different provider’s graph database or from Gremlin-Python/C#/etc., 
all the vertices are simply ‘reference vertices’ (id + label). This makes it so 
that users never interact with the graph element objects themselves directly. 
They can ONLY interact with the graph via traversals! At most they can 
ReferenceVertex.id() and ReferenceVertex.label(). Thats it, — no mutations, not 
walking edges, nada! And moreover, since ReferenceXXX has enough information to 
re-attach to the source graph, they can always do the following to get more 
information:

v = g.V(1).out(‘knows’).next()
g.V(v).values(‘name’)

This split into two Graph APIs will enables us to make a hard boundary between 
what the provider (vendor) needs to implement and what the user (developer) 
gets to access. This distinction should solve the problems articulated at the 
start of this email.

Thoughts?,
Marko.

http://markorodriguez.com



Reply via email to