Hello, Throughout our documentation we show uses of the “Blueprints API” (i.e. Graph/Vertex/Edge/etc. classes & methods) as well as the use of the Traversal API (i.e. Gremlin).
Enabling users to have two ways of interacting with the graph system has its problems: 1. The DetachedXXX problem — how much data should a returned vertex/edge/etc. have associated with it? 2. graph.addVertex() and g.addV() — which should I use? The first is faster but is not recommended. 3. SubgraphStrategy leaking — I get subgraphs with Gremlin, but can then directly interact with the vertex objects to see more than I should. 4. VertexProgram model — I write traversals with Traversal API, but then develop VertexPrograms with the Blueprints API. That’s weird. 5. GremlinServer returning fat objects — Serializers are created property-rich vertices and edges. The awkward HaltedTraversalStrategy solution. 6. … various permutations of these source problems. I propose that we solve this problem once and for all in TinkerPop4 as follows: There should be two “Graph APIs.” 1. Provider Graph API: This is the current Blueprints API with Graph.addVertex(), Vertex.edges(), Edge.inVertex(), etc. 3. User Graph API: This is a ReferenceXXX API. Lets talk about the second as its more novel and distinct from current practices. We should have ReferenceGraph which is simply a reference/dummy/proxy to the provider Graph API. ReferenceGraph has the following API: ReferenceGraph.open() ReferenceGraph.close() ReferenceGraph.tx() // assuming we like the current transaction model (??) ReferenceGraph.traversal() That is it. What does this entail? Assume the following traversal: g = ReferenceGraph.open(config).traversal() g.V(1).out(‘knows’) ReferenceGraph is almost like a “RemoteGraph” (RemoteStrategy) in that it makes a connection (remote or inter-JVM) to the provider Graph API. When g.V(1).out(‘knows’) executes, it is really sending the bytecode to the provider Graph for execution (as specified by the config of ReferenceGraph.open()). Thus, once it hits the provider's graph, ProviderVertex, ProviderEdge, etc. are the objects being processed. However, what the traversal’s Iterator<Vertex> returns is ReferenceVertex! That is, it never returns ProviderVertex. In this way, regardless if the user is going “over the wire” or within the same JVM or against a different provider’s graph database or from Gremlin-Python/C#/etc., all the vertices are simply ‘reference vertices’ (id + label). This makes it so that users never interact with the graph element objects themselves directly. They can ONLY interact with the graph via traversals! At most they can ReferenceVertex.id() and ReferenceVertex.label(). Thats it, — no mutations, not walking edges, nada! And moreover, since ReferenceXXX has enough information to re-attach to the source graph, they can always do the following to get more information: v = g.V(1).out(‘knows’).next() g.V(v).values(‘name’) This split into two Graph APIs will enables us to make a hard boundary between what the provider (vendor) needs to implement and what the user (developer) gets to access. This distinction should solve the problems articulated at the start of this email. Thoughts?, Marko. http://markorodriguez.com