Re: A meta model for gremlin's property graph

Joshua Shinavier Tue, 11 Jan 2022 10:48:03 -0800

Hey Pieter,

Good to see some more motion on this front. Responses inline.

On Sun, Jan 9, 2022 at 4:28 AM pieter gmail <pieter.mar...@gmail.com> wrote:

> Hi,
>
> I have done some work on defining a meta model for Gremlin's property
> graph. I am using the approach used in the modelling world, in particular
> as done by the OMG <https://www.omg.org/> group when defining their
> various meta models and specifications.
>

+1 to using or drawing upon standards where we can. For those of us
(including me) who have not worked with OMG standards other than
occasionally bumping into UML, which parts of the approach you describe
below were influenced by OMG?

> However where OMG uses a subset of the UML to define their meta models I
> suggest we use Gremlin. After all Gremlin is the language we use to
> describe the world and the property graph meta model can also be described
> in Gremlin.
>

I agree, as long as these descriptions do not admit "arbitrary Gremlin".
The problem right now is that Gremlin's declarative semantics aren't very
clear, and it is a relatively complex language. I totally agree that you
could define a DSL for defining models which could be embedded in Gremlin;
you could even define the DSL in terms of itself.

> I propose that we have 3 levels of modelling. Each of which can itself be
> specified in gremlin.
>
> 1: The property graph meta model.
>

+1

> 2: The model.
>

I like the term "schema".

> 3: The graph representing the actual data.
>

+1. Not only is the graph a "model", but depending on how you define the
modeling DSL, you can also see the other two models as "graphs", with types
as elements.

> 1) The property graph meta model describes the nature of the property
> graph itself. i.e. that property graphs have vertices, edges and properties.
>

I agree, and I think there is value in going one step further to create a
general purpose data model for defining data models, with property graphs
as a special case.

> 2) The model is an instance of the meta model. It describes the schema of
> a particular graph. i.e. for TinkerPop's modern graph this would be
> 'person', 'software', 'created' and 'knows' and the various properties
> 'weight', 'age', 'name' and 'lang' properties.
>

+1

3) The final level is an instance of the model. It is the actual graph
> itself. i.e. for TinkerPop's modern graph it is 'Marko', 'Josh', 'java' ...
>

Yes. So to elaborate on what I said above about models and graphs, let's
say we add a schema to the TinkerPop classic graph. The classic graph is an
instance of the schema, and the schema is an instance of a property graph
schema. Your three models are three graphs:
1) the classic graph ("data graph") has elements "Marko", "Josh", "ripple"
etc. each of which is a value together with a type and a name (id). The
type of Marko is "Person" (a named type) and the type of ripple is
"Project" etc. The value of Marko is the record {"name": "marko", "age":
29} while the value of ripple is {"name": "ripple", "lang": "java"}.
2) the schema of the classic graph ("schema graph") has elements "Person",
"Project", "knows", and "created". These again are values together with
types and ids. E.g. the type of "Person" is something like {"name": string,
"age": int32}, i.e. a record type.
3) the schema of the schema of the classic graph -- i.e. the core model or
what you called the meta model -- is again a graph with elements like
"Type", "Element", etc. Type expressions in the schema of the classic graph
are values in the core model. The core model is its own schema.

Decide for yourself if the above makes sense to you, but this is how I
think of the TinkerPop modeling layer cake these days -- as chained models
in which the schema of one graph is the data of the next, usually arriving
at a fixpoint -- the core -- within two steps.

1: Property Graph Meta Model
>
>     public static Graph gremlinMetaModel() {
>
>         enum GremlinDataType {
>
>             STRING,
>
>             INTEGER,
>
>             DOUBLE,
>
>             DATE,
>
>             TIME
>
>             //...
>
>         }
>
>
Cool, except that I would banish types like Date and Time from the core
model. Drawing the line between primitive types and derived types is more
art than science, but there is enough variation in what developers want out
of dates/times that I put them on the other side of the fence. It also
makes implementations easier if you have as few baked-in types as possible.
On the other hand, I suggest adding many more numeric types, e.g. for
integers:

- bigint
- int8
- int16
- int32
- int64
- uint8
- uint16
- uint32
- uint64

and for floating-point numbers:

- name: bigfloat
- name: float32
- name: float64

[snip metamodel definition]
>
>
> This can be visualized as,
> ...
>

I'm not sure if I'm reading this correctly, and I can't see the figure yet,
but I understand that you are defining the metamodel as a graph. Cool.

> Notes:
> 1) GremlinDataType is an enumeration of named data types that Gremlin
> supports. All gremlin data types are assumed to be atomic and its life
> cycle fully owned by its containing parent. How it is persisted on disc or
> transported over the wire is not a concern for the meta model.
>

Agree with most. Primitive/literal types are atomic, but you should be also
able to define complex data types and bind them to names, and that is
essentially what you are doing in the above.

> 2) Gremlin's semantics is to weak to fully specify a valid meta model.
> Accompanying the meta model we need a list of constraints specified as
> gremlin queries to augment the semantics of the meta model. These
> constraints/queries will be able to validate any gremlin specified model
> for correctness.
>

Or the other way around: we define a core model as its own thing using a
well-defined, controlled vocabulary, then map it into Gremlin.

3) It is trivial to extend the meta model. e.g. To specify something like
> index support just add an 'Index' vertex and an edge from 'VertexLabel' to
> it.
>

I would say that you're extending a second-order model in that case. The
core model / metamodel should be constant, but you can define additional
models on top of it.

> Property graph meta model constraints,
>
> [...]
>

Cool (though here, too, I would define constraints in a limited DSL, and
map them into Gremlin).

> 2: The model
>
> What follows is an example of TinkerPop's 'modern' graph specified as an
> instance of the above property graph meta model.
> [...]
>

Cool.

> There are lots of details to complete, but first we need to see if there
> is any appetite for a modelling approach as I realize there is some
> academic abstract algebra work happening elsewhere.
>

There is, but I don't see your approach of embedding model definitions and
constraints natively in Gremlin as being at odds with having a formal data
model. Have cake, eat it too.

> It seems to me to have a lower barrier to entry for the community to
> partake in the discussion of what constitutes a property graph model.
>

That's important. In my opinion, having the formal model defined up front
gives you more power and flexibility for graph validation, transformations,
and inference, but having the abstract model, you can also build
developer-friendly DSLs on top of it.

Let me know if there are questions or criticisms.
>

One of the nice things about your proposal is that it doesn't increase Java
tech debt; you're suggesting defining models using Gremlin syntax, which is
language variant -neutral. +1 to that.

 Josh

Re: A meta model for gremlin's property graph

Reply via email to