Hi, I have done some work on defining a meta model for Gremlin's property graph. I am using the approach used in the modelling world, in particular as done by the OMG group when defining their various meta models and specifications.
However where OMG uses a subset of the UML to define their meta models I suggest we use Gremlin. After all Gremlin is the language we use to describe the world and the property graph meta model can also be described in Gremlin. I propose that we have 3 levels of modelling. Each of which can itself be specified in gremlin. 1: The property graph meta model. 2: The model. 3: The graph representing the actual data. 1) The property graph meta model describes the nature of the property graph itself. i.e. that property graphs have vertices, edges and properties. 2) The model is an instance of the meta model. It describes the schema of a particular graph. i.e. for TinkerPop's modern graph this would be 'person', 'software', 'created' and 'knows' and the various properties 'weight', 'age', 'name' and 'lang' properties. 3) The final level is an instance of the model. It is the actual graph itself. i.e. for TinkerPop's modern graph it is 'Marko', 'Josh', 'java' ... 1: Property Graph Meta Model public static Graph gremlinMetaModel() { enum GremlinDataType { STRING, INTEGER, DOUBLE, DATE, TIME //... } TinkerGraph propertyGraphMetaModel = TinkerGraph.open(); Vertex graph = propertyGraphMetaModel.addVertex(T.label, "Graph", "name", "GremlinDataType::STRING"); Vertex vertex = propertyGraphMetaModel.addVertex(T.label, "VertexLabel", "label", "GremlinDataType::STRING"); Vertex edge = propertyGraphMetaModel.addVertex(T.label, "EdgeLabel", "label", "GremlinDataType::STRING"); Vertex vertexProperty = propertyGraphMetaModel.addVertex(T.label, "VertexProperty", "name", "GremlinDataType::STRING", "type", "GremlinDataType"); Vertex edgeProperty = propertyGraphMetaModel.addVertex(T.label, "EdgeProperty", "name", "GremlinDataType::STRING", "type", "GremlinDataType"); graph.addEdge("vertices", vertex); graph.addEdge("edges", edge); vertex.addEdge("properties", vertexProperty); vertex.addEdge("properties", edgeProperty); vertex.addEdge("out", edge); vertex.addEdge("in", edge); return propertyGraphMetaModel; } This can be visualized as, Notes: 1) GremlinDataType is an enumeration of named data types that Gremlin supports. All gremlin data types are assumed to be atomic and its life cycle fully owned by its containing parent. How it is persisted on disc or transported over the wire is not a concern for the meta model. 2) Gremlin's semantics is to weak to fully specify a valid meta model. Accompanying the meta model we need a list of constraints specified as gremlin queries to augment the semantics of the meta model. These constraints/queries will be able to validate any gremlin specified model for correctness. 3) It is trivial to extend the meta model. e.g. To specify something like index support just add an 'Index' vertex and an edge from 'VertexLabel' to it. Property graph meta model constraints, 1) Every 'VertexLabel' must have a 'label'. g.V().hasLabel("EdgeLabel").where(__.not(__.in("inEdge"))).id() 2) Every 'EdgeLabel' must have a 'label'. g.V().hasLabel("EdgeLabel").or(__.hasNot("label"), __.has("label", P.eq(""))).id() 3) Every 'EdgeLabel' must have at least one 'outEdge' 'VertexLabel' g.V().hasLabel("EdgeLabel").where(__.not(__.in("outEdge"))).id() 4) Every 'EdgeLabel' must have at least on 'inEdge' 'VertexLabel' g.V().hasLabel("EdgeLabel").where(__.not(__.in("inEdge"))).id() 5) Every 'VertexProperty' must have a 'name' gV().hasLabel("VertexProperty").or(__.hasNot("name"), __.has("name", P.eq(""))).id() 6) Every 'VertexProperty' must have a 'type' g.V().hasLabel("VertexProperty").or(__.hasNot("type"), __.has("type", P.eq(""))).id() 7) Every 'EdgePropery' must have a 'name' g.V().hasLabel("EdgeProperty").or(__.hasNot("name"), __.has("name", P.eq(""))).id() 8) Every 'EdgeProperty' must have a 'type' g.V().hasLabel("EdgeProperty").or(__.hasNot("type"), __.has("type", P.eq(""))).id() 9) Every 'VertexProperty' must have a in 'properties' edge. g.V().hasLabel("VertexProperty").where(__.not(__.in("properties"))).id() 10) Every 'EdgeProperty' must have a in 'properties' edge. g.V().hasLabel("EdgeProperty").where(__.not(__.in("properties"))).id() ... This can be visualized as, 2: The model What follows is an example of TinkerPop's 'modern' graph specified as an instance of the above property graph meta model. public static Graph modernModel() { //import this from a base package enum GremlinDataType { STRING, INTEGER, DOUBLE, DATE, TIME //... } TinkerGraph modernModelGraph = TinkerGraph.open(); Vertex person = modernModelGraph.addVertex(T.label, "VertexLabel", "label", "person"); Vertex personNameVertexProperty = modernModelGraph.addVertex(T.label, "VertexProperty", "name", "name", "type", GremlinDataType.STRING.name()); Vertex personAgeVertexProperty = modernModelGraph.addVertex(T.label, "VertexProperty", "name", "age", "type", GremlinDataType.INTEGER.name()); person.addEdge("properties", personNameVertexProperty); person.addEdge("properties", personAgeVertexProperty); Vertex software = modernModelGraph.addVertex(T.label, "VertexLabel", "label", "software"); Vertex softwareNameVertexProperty = modernModelGraph.addVertex(T.label, "VertexProperty", "name", "name", "type", GremlinDataType.STRING.name()); Vertex softwareLangVertexProperty = modernModelGraph.addVertex(T.label, "VertexProperty", "name", "lang", "type", GremlinDataType.STRING.name()); software.addEdge("properties", softwareNameVertexProperty); software.addEdge("properties", softwareLangVertexProperty); Vertex knows = modernModelGraph.addVertex(T.label, "EdgeLabel", "label", "knows"); Vertex knowsWeightVertexProperty = modernModelGraph.addVertex(T.label, "EdgeProperty", "name", "weight", "type", GremlinDataType.INTEGER.name()); knows.addEdge("properties", knowsWeightVertexProperty); Vertex created = modernModelGraph.addVertex(T.label, "EdgeLabel", "label", "created"); Vertex createdWeightVertexProperty = modernModelGraph.addVertex(T.label, "EdgeProperty", "name", "weight", "type", GremlinDataType.INTEGER.name()); created.addEdge("properties", createdWeightVertexProperty); person.addEdge("outEdge", knows); person.addEdge("outEdge", created); software.addEdge("inEdge", knows); software.addEdge("inEdge", created); return modernModelGraph; } The above gremlin constraints need to be executed against the 'modernModelGraph' to check for the correctness of the model. If the constraints pass we know that the model is indeed a valid Gremlin property graph model. We can certainly specify more constraints but the ones defined above passes on the 'modernModelGraph'. 3: The actual graph TinkerGraph modernGraph = TinkerFactory.createModern(); And that is it. There are lots of details to complete, but first we need to see if there is any appetite for a modelling approach as I realize there is some academic abstract algebra work happening elsewhere. It seems to me to have a lower barrier to entry for the community to partake in the discussion of what constitutes a property graph model. Let me know if there are questions or criticisms. Thanks Pieter