Hi,

I have done some work on defining a meta model for Gremlin's property
graph. I am using the approach used in the modelling world, in
particular as done by the OMG group when defining their various meta
models and specifications.

However where OMG uses a subset of the UML to define their meta models
I suggest we use Gremlin. After all Gremlin is the language we use to
describe the world and the property graph meta model can also be
described in Gremlin.

I propose that we have 3 levels of modelling. Each of which can itself
be specified in gremlin.

1: The property graph meta model.
2: The model.
3: The graph representing the actual data.

1) The property graph meta model describes the nature of the property
graph itself. i.e. that property graphs have vertices, edges and
properties.

2) The model is an instance of the meta model. It describes the schema
of a particular graph. i.e. for TinkerPop's modern graph this would be
'person', 'software', 'created' and 'knows' and the various properties
'weight', 'age', 'name' and 'lang' properties.

3) The final level is an instance of the model. It is the actual graph
itself. i.e. for TinkerPop's modern graph it is 'Marko', 'Josh', 'java'
...


1: Property Graph Meta Model

    public static Graph gremlinMetaModel() {
        enum GremlinDataType {
            STRING,
            INTEGER,
            DOUBLE,
            DATE,
            TIME
            //...
        }
        TinkerGraph propertyGraphMetaModel = TinkerGraph.open();
        Vertex graph = propertyGraphMetaModel.addVertex(T.label, "Graph", 
"name", "GremlinDataType::STRING");
        Vertex vertex = propertyGraphMetaModel.addVertex(T.label, 
"VertexLabel", "label", "GremlinDataType::STRING");
        Vertex edge = propertyGraphMetaModel.addVertex(T.label, "EdgeLabel", 
"label", "GremlinDataType::STRING");
        Vertex vertexProperty = propertyGraphMetaModel.addVertex(T.label, 
"VertexProperty", "name", "GremlinDataType::STRING", "type", "GremlinDataType");
        Vertex edgeProperty = propertyGraphMetaModel.addVertex(T.label, 
"EdgeProperty", "name", "GremlinDataType::STRING", "type", "GremlinDataType");

        graph.addEdge("vertices", vertex);
        graph.addEdge("edges", edge);
        vertex.addEdge("properties", vertexProperty);
        vertex.addEdge("properties", edgeProperty);
        vertex.addEdge("out", edge);
        vertex.addEdge("in", edge);

        return propertyGraphMetaModel;
    }

This can be visualized as,

Notes: 
1) GremlinDataType is an enumeration of named data types that Gremlin
supports. All gremlin data types are assumed to be atomic and its life
cycle fully owned by its containing parent. How it is persisted on disc
or transported over the wire is not a concern for the meta model.
2) Gremlin's semantics is to weak to fully specify a valid meta model.
Accompanying the meta model we need a list of constraints specified as
gremlin queries to augment the semantics of the meta model. These
constraints/queries will be able to validate any gremlin specified
model for correctness.
3) It is trivial to extend the meta model. e.g. To specify something
like index support just add an 'Index' vertex and an edge from
'VertexLabel' to it.

Property graph meta model constraints,

1) Every 'VertexLabel' must have a 'label'.
    g.V().hasLabel("EdgeLabel").where(__.not(__.in("inEdge"))).id()
2) Every 'EdgeLabel' must have a 'label'.
    g.V().hasLabel("EdgeLabel").or(__.hasNot("label"), __.has("label", 
P.eq(""))).id()
3) Every 'EdgeLabel' must have at least one 'outEdge' 'VertexLabel'
    g.V().hasLabel("EdgeLabel").where(__.not(__.in("outEdge"))).id()
4) Every 'EdgeLabel' must have at least on 'inEdge' 'VertexLabel'
    g.V().hasLabel("EdgeLabel").where(__.not(__.in("inEdge"))).id()
5) Every 'VertexProperty' must have a 'name'
    gV().hasLabel("VertexProperty").or(__.hasNot("name"), __.has("name", 
P.eq(""))).id()
6) Every 'VertexProperty' must have a 'type'
    g.V().hasLabel("VertexProperty").or(__.hasNot("type"), __.has("type", 
P.eq(""))).id()
7) Every 'EdgePropery' must have a 'name'
    g.V().hasLabel("EdgeProperty").or(__.hasNot("name"), __.has("name", 
P.eq(""))).id()
8) Every 'EdgeProperty' must have a 'type'
    g.V().hasLabel("EdgeProperty").or(__.hasNot("type"), __.has("type", 
P.eq(""))).id()
9) Every 'VertexProperty' must have a in 'properties' edge.
    g.V().hasLabel("VertexProperty").where(__.not(__.in("properties"))).id()
10) Every 'EdgeProperty' must have a in 'properties' edge.
    g.V().hasLabel("EdgeProperty").where(__.not(__.in("properties"))).id()
...

This can be visualized as,


2: The model

What follows is an example of TinkerPop's 'modern' graph specified as
an instance of the above property graph meta model.

    public static Graph modernModel() {
        //import this from a base package
        enum GremlinDataType {
            STRING,
            INTEGER,
            DOUBLE,
            DATE,
            TIME
            //...
        }

        TinkerGraph modernModelGraph = TinkerGraph.open();

        Vertex person = modernModelGraph.addVertex(T.label, "VertexLabel", 
"label", "person");
        Vertex personNameVertexProperty = modernModelGraph.addVertex(T.label, 
"VertexProperty", "name", "name", "type", GremlinDataType.STRING.name());
        Vertex personAgeVertexProperty = modernModelGraph.addVertex(T.label, 
"VertexProperty", "name", "age", "type", GremlinDataType.INTEGER.name());
        person.addEdge("properties", personNameVertexProperty);
        person.addEdge("properties", personAgeVertexProperty);

        Vertex software = modernModelGraph.addVertex(T.label, "VertexLabel", 
"label", "software");
        Vertex softwareNameVertexProperty = modernModelGraph.addVertex(T.label, 
"VertexProperty", "name", "name", "type", GremlinDataType.STRING.name());
        Vertex softwareLangVertexProperty = modernModelGraph.addVertex(T.label, 
"VertexProperty", "name", "lang", "type", GremlinDataType.STRING.name());
        software.addEdge("properties", softwareNameVertexProperty);
        software.addEdge("properties", softwareLangVertexProperty);

        Vertex knows = modernModelGraph.addVertex(T.label, "EdgeLabel", 
"label", "knows");
        Vertex knowsWeightVertexProperty = modernModelGraph.addVertex(T.label, 
"EdgeProperty", "name", "weight", "type", GremlinDataType.INTEGER.name());
        knows.addEdge("properties", knowsWeightVertexProperty);

        Vertex created = modernModelGraph.addVertex(T.label, "EdgeLabel", 
"label", "created");
        Vertex createdWeightVertexProperty = 
modernModelGraph.addVertex(T.label, "EdgeProperty", "name", "weight", "type", 
GremlinDataType.INTEGER.name());
        created.addEdge("properties", createdWeightVertexProperty);

        person.addEdge("outEdge", knows);
        person.addEdge("outEdge", created);
        software.addEdge("inEdge", knows);
        software.addEdge("inEdge", created);
        return modernModelGraph;
    }

The above gremlin constraints need to be executed against the
'modernModelGraph' to check for the correctness of the model. If the
constraints pass we know that the model is indeed a valid Gremlin
property graph model.

We can certainly specify more constraints but the ones defined above
passes on the 'modernModelGraph'.

3: The actual graph

TinkerGraph modernGraph = TinkerFactory.createModern();

And that is it.

There are lots of details to complete, but first we need to see if
there is any appetite for a modelling approach as I realize there is
some academic abstract algebra work happening elsewhere. It seems to me
to have a lower barrier to entry for the community to partake in the
discussion of what constitutes a property graph model.

Let me know if there are questions or criticisms.

Thanks
Pieter

Reply via email to