Re: [DISCUSS] Graph Schema Interfaces for TP

Cole Greer Tue, 14 Apr 2026 17:53:01 -0700

Hi Josh,

I like the idea of parameterized types. That sounds great if providers are free 
to plug in a rich type system if they choose, or to drop in a simple enum if 
they don't want added complexity.


Thanks,
Cole

On 2026/04/11 17:04:07 Joshua Shinavier wrote:
> Hi Cole,
> 
> I agree with keeping things as simple as possible. The schema types I
> shared are parameterized by a *T* type for ids and property values. This
> keeps the model flexible: providers can plug in whatever they need for T --
> whether something as complex as Hydra's Type type (which does include
> complex record and union types, among other things), or as simple as a
> provider specific enum with STRING, INT, LONG, etc. Now, in TinkerPop we
> could keep the types parameterized (which is what I recommend), or we could
> replace T with a TinkerPop-specific type system; like an enum for
> DatatypeFeatures
> <https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/ext/org/apache/tinkerpop/features/DataTypeFeatures.html>
> (which
> is based on Gremlin's Graph.Features
> <https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/Graph.Features.html>
> interface). Does that address the concern?
> 
> Josh
> 
> 
> 
> On Fri, Apr 10, 2026 at 2:52 PM Cole Greer <[email protected]> wrote:
> 
> > Hi Josh,
> >
> > I agree that the property graph model from Hydra should adapt well for our
> > purposes in TinkerPop. My main concern regarding schema types is that
> > providers aren't unnecessarily burdened when trying to map types from an
> > existing proprietary schema into our new interfaces. I like the idea of our
> > schema types supporting complex composite and union types, however my
> > primary goal is to keep this as simple for providers to adopt as possible.
> >
> > Thanks,
> > Cole
> >
> > On 2026/04/03 03:31:12 Joshua Shinavier wrote:
> > > Hi Cole. This looks good to me. With respect to the schema types, would
> > you
> > > please review the property graph model I showed a couple of meetings ago
> > --
> > > here
> > > <
> > https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/package-summary.html
> > >
> > > --
> > > and let me know if you have any feedback. The types of interest start
> > with
> > > GraphSchema
> > > <
> > https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/GraphSchema.html
> > >.
> > > Ignore Hydra-specific details like PersistentMap/ConsList/Name, etc. If
> > the
> > > structure is agreeable, I will map the types into a format suitable for
> > use
> > > in your PR, in an org.apache.tinkerpop namespace. There is an associated
> > > JSON format for interchange of GraphSchema and any other type we define
> > in
> > > this way.
> > >
> > > Josh
> > >
> > >
> > >
> > > On Thu, Apr 2, 2026 at 7:17 PM Cole Greer via dev <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > The topic of Graph Schema has been discussed extensively in recent
> > > > TInkerPop Gatherings, and the following proposal has emerged from these
> > > > gatherings. I believe it is now ready for broad consideration and
> > > > discussions. I’ve done my best to incorporate initial feedback from
> > Josh,
> > > > Pieter, Valentyn, Stephen, Kris and others into this proposal, however
> > I
> > > > won’t claim that it accurately represents the views of anyone other
> > than
> > > > myself at this time. This is a broad topic and I’m deliberately
> > excluding
> > > > critical topics to focus this thread on standardizing interfaces for
> > > > gremlin users and providers to interact with schema (see assumptions
> > for
> > > > more details).
> > > >
> > > > ## Overview
> > > >
> > > > This proposal introduces graph schema interfaces for TinkerPop: a way
> > to
> > > > define vertex types, edge types, and property types as a meta-graph
> > that is
> > > > itself traversable with Gremlin. The schema describes the structure of
> > a
> > > > data graph; what kinds of vertices and edges exist, what properties
> > they
> > > > carry, and how they connect..
> > > >
> > > > ## Assumptions
> > > >
> > > > - Type keys are element labels: there is a 1-to-1 mapping between a
> > label
> > > > and a type definition. A vertex labeled "person" corresponds to
> > exactly one
> > > > VertexType, and an edge labeled "knows" corresponds to exactly one
> > EdgeType.
> > > > - Java classes are used as a type system: This proposal uses Java
> > classes
> > > > to define property type constraints. This is intended as a placeholder
> > to
> > > > be replaced by a proper type system to be defined via a later
> > discussion.
> > > > - This proposal makes very little consideration of if/when/where/how
> > > > validation and enforcement of schema takes place. I believe it is
> > important
> > > > for us to ship something which is flexible and useful to providers out
> > of
> > > > the box as well as leaving space for providers to plugin existing
> > > > implementations or build their own if they desire. I’ve left this out
> > of
> > > > scope for this proposal to focus first on interfaces which give
> > providers
> > > > the appropriate access to schema.
> > > >
> > > > ## Design Points
> > > >
> > > > ### 1. Schema-as-Graph
> > > >
> > > > `GraphSchema extends Graph`. Providers implement a familiar interface,
> > and
> > > > users traverse the schema with schema.traversal(). This avoids
> > inventing a
> > > > parallel API surface. The schema is just another graph.
> > > >
> > > > A data graph exposes its schema via Graph.schema(), which returns the
> > > > GraphSchema instance. Providers that don't support schema return
> > > > UnsupportedOperationException by default.
> > > >
> > > > ### 2. All type definitions are vertices
> > > >
> > > > VertexType, EdgeType, and PropertyType are all vertices in the schema
> > > > meta-graph.
> > > >
> > > > - A VertexType vertex represents a vertex label definition (e.g.
> > "person",
> > > > "software").
> > > > - An EdgeType vertex represents an edge label definition (e.g. "knows",
> > > > "created"). Even though it describes edges in the data graph, it is
> > itself
> > > > a vertex in the schema graph, connected to its endpoint VertexType
> > vertices
> > > > via from/to edges.
> > > > - A PropertyType vertex represents a property on a type, connected to
> > its
> > > > parent type vertex via a “hasProperty" edge.
> > > >
> > > > Property definitions are independent per type, no sharing across types.
> > > >
> > > > Schema graph example for the classic TinkerPop modern graph:
> > > > ```
> > > > (person:vertexType) --hasProperty--> (name:propertyType)
> > > > (person:vertexType) --hasProperty--> (age:propertyType)
> > > > (software:vertexType) --hasProperty--> (name:propertyType)
> > > > (software:vertexType) --hasProperty--> (lang:propertyType)
> > > > (knows:edgeType) --from--> (person:vertexType)
> > > > (knows:edgeType) --to-->   (person:vertexType)
> > > > (knows:edgeType) --hasProperty--> (weight:propertyType)
> > > > (created:edgeType) --from--> (person:vertexType)
> > > > (created:edgeType) --to-->   (software:vertexType)
> > > > (created:edgeType) --hasProperty--> (weight:propertyType)
> > > > ```
> > > >
> > > > ### 3. Constraints are properties on type vertices
> > > >
> > > > Rather than a fixed constraint taxonomy, constraints are regular
> > > > properties on type vertices, keyed by string via constraint(key,
> > value).
> > > > This keeps the model extensible such that providers can define their
> > own
> > > > constraints without changes to the core API.
> > > >
> > > > Constraints can be added to VertexType, EdgeType, and PropertyType
> > > > vertices directly. The most common constraints such as property types
> > and
> > > > required properties would apply to PropertyTypes, while edge
> > multiplicity
> > > > constraints (e.g. one-to-many, one-to-one) are naturally expressed as
> > > > constraints on the EdgeType itself rather than on any property.
> > > >
> > > > While constraint keys are arbitrary strings and providers are free to
> > > > implement any constraints they like, TinkerPop should standardize a
> > set of
> > > > core constraint keys representing the most common constraints. Examples
> > > > include “type", “required", “unique", “minValue", “maxValue", etc.
> > > > Providers that support equivalent constraints are encouraged to follow
> > > > these conventional names for interoperability.
> > > >
> > > > Non-core constraints (custom to a provider) are encouraged to follow a
> > > > namespaced key convention to avoid collisions, e.g.
> > "tinkergraph:notNull".
> > > > Core constraint keys are unnamespaced.
> > > >
> > > > ### 4. Schema traversal steps in core Gremlin
> > > >
> > > > New steps for schema manipulation live directly in
> > > > GraphTraversal/GraphTraversalSource, not in a separate DSL:
> > > >
> > > > - addVType(label) — creates a VertexType vertex
> > > > - addEType(label) — creates an EdgeType vertex
> > > > - propertyType(name) — creates a PropertyType vertex and connects it
> > via
> > > > hasProperty
> > > > - constraint(key, value) — adds a constraint property to the current
> > type
> > > > vertex
> > > >
> > > > Example: defining a vertex type with properties:
> > > > ```
> > > > schema.traversal().addVType("person")
> > > >     .propertyType("name").constraint("type",
> > > > String.class).constraint("required", true).constraint("unique", true)
> > > >     .propertyType("age").constraint("type", Integer.class)
> > > > ```
> > > >
> > > > Example: defining an edge type with endpoint types and a property:
> > > > ```
> > > > schema.traversal().addEType("knows")
> > > >     .from("person").to("person")
> > > >     .propertyType("weight").constraint("type", Double.class)
> > > > ```
> > > >
> > > > This mirrors the addE().from().to() pattern from the data-graph. Here
> > > > from() and to() take vertex type labels (strings) and create from/to
> > edges
> > > > in the schema graph connecting the EdgeType to the referenced
> > VertexType
> > > > vertices.
> > > >
> > > > ### 5. Convenience methods for direct access
> > > >
> > > > The schema-as-graph model is the source of truth, but traversing it for
> > > > simple lookups isn’t always convenient. Direct methods provide compact
> > > > access:
> > > >
> > > > GraphSchema methods:
> > > > - vertexTypes() → Collection<VertexType>
> > > > - vertexType(String label) → Optional<VertexType>
> > > > - edgeTypes() → Collection<EdgeType>
> > > > - edgeType(String label) → Optional<EdgeType>
> > > > - addVertexType(String label) → VertexType
> > > > - addEdgeType(String label) → EdgeType
> > > > - store(OutputStream):  serialize the schema to a compact JSON
> > > > representation
> > > > - load(InputStream): deserialize and merge a schema from JSON into this
> > > > schema graph
> > > >
> > > > EdgeType methods:
> > > > - fromVertexTypes() → Collection<VertexType>
> > > > - toVertexTypes() → Collection<VertexType>
> > > >
> > > > Example:
> > > > ```
> > > > GraphSchema schema = graph.schema();
> > > >
> > > > // Look up a vertex type
> > > > VertexType person = schema.vertexType("person").orElseThrow();
> > > >
> > > > // Inspect its properties
> > > > for (PropertyType pd : person.propertyTypes()) {
> > > >     System.out.println(pd.name() + " : " + pd.constraint("type"));
> > > > }
> > > >
> > > > // Look up an edge type and its connectivity
> > > > EdgeType knows = schema.edgeType("knows").orElseThrow();
> > > > Collection<VertexType> fromTypes = knows.fromVertexTypes();
> > > > Collection<VertexType> toTypes = knows.toVertexTypes();
> > > > ```
> > > >
> > > > ### 6. Cross-graph jumps
> > > >
> > > > Two steps bridge the data graph and schema graph:
> > > >
> > > > - type(): from a data traversal, jump to the element's type definition
> > in
> > > > the schema graph.
> > > > - instances(): from a schema traversal, jump to all matching elements
> > in
> > > > the data graph.
> > > >
> > > > These compose for round-trip traversals:
> > > > ```
> > > > // Get the type definition for "person" vertices
> > > > g.V().hasLabel("person").type()
> > > >
> > > > // Get all instances of a schema type
> > > > schema.traversal().vertexType("person").instances()
> > > >
> > > > // Round-trip: find marko's type, then get all instances of that type
> > > > g.V().has("person", "name", "marko").type().instances()
> > > > ```
> > > >
> > > > ### 7. Schema restriction strategy
> > > >
> > > > There are some steps we will want to restrict in both the data graph
> > and
> > > > the schema-graph. addVType() wouldn’t make sense in the data-graph, nor
> > > > would addV() be sensible in the schema-graph. A TraversalStrategy can
> > > > restrict schema traversals to a safe subset of Gremlin steps
> > > > (allowlist-based). This prevents accidentally running data element
> > > > insertions, OLAP computations, complex control flow, or side-effect
> > steps
> > > > against the schema graph. The strategy should be auto-registered when
> > > > traversing a GraphSchema instance.
> > > >
> > > > The exact allowlist should be a topic for later discussion.
> > > >
> > > > ### 8. Instance counts on type vertices
> > > >
> > > > VertexType.instanceCount() and EdgeType.instanceCount() return the
> > count
> > > > of data graph elements matching each type. This is a method rather
> > than a
> > > > property on the type vertex, keeping the schema graph definitional (not
> > > > statistical) and giving providers full implementation flexibility.
> > > >
> > > > Approximate counts are likely acceptable and preferable for
> > performance in
> > > > most cases. However, TinkerPop should not stand in the way of providers
> > > > that prefer exact counts, and should ensure that appropriate hooks are
> > in
> > > > place in reference implementations so that providers can maintain exact
> > > > counts if they so desire.
> > > >
> > > > Transactional implications need additional consideration. Maintaining
> > > > accurate counts across concurrent writes, rollbacks, and transaction
> > > > isolation levels adds significant complexity. This interacts with the
> > > > broader schema transactions question (see transactions below) and
> > should be
> > > > addressed alongside it.
> > > >
> > > > ### 9. GLV Support
> > > >
> > > > Each GLV (Python, JavaScript, .NET, Go) needs:
> > > >
> > > > - Schema data classes: Parallel classes to the 4 core Java interfaces,
> > > > following the same pattern as existing Vertex and Edge classes. These
> > are
> > > > data containers representing schema objects returned from the server:
> > > >   - GraphSchema: holds collections of VertexTypes and EdgeTypes
> > > >   - VertexType: label, full constraints map, and collection of
> > > > PropertyTypes
> > > >   - EdgeType: label, full constraints map, from/to VertexType
> > references
> > > > (same pattern as Edge.outV/Edge.inV), and collection of PropertyTypes
> > > >   - PropertyType: name and full constraints map (including data type
> > as a
> > > > constraint)
> > > > - All new gremlin steps are supported from each GLV
> > > >
> > > > ## Future Questions
> > > >
> > > > ### Schema validation
> > > >
> > > > Providers will need lots of flexibility regarding validation modes.
> > Some
> > > > providers may choose to have write-time validation for all inserts,
> > others
> > > > may choose validate an entire graph against a schema as a batch job,
> > while
> > > > others may choose to validate on-commit. For our purposes, we need to
> > > > provide a viable reference implementation, as well as ensuring
> > sufficient
> > > > extension points exist for providers to fulfill their needs.
> > > >
> > > > ### Dynamic schema updates from data writes
> > > >
> > > > It would be useful to auto-update the schema graph when data writes
> > > > introduce new labels or properties (e.g. addV("newLabel”) automatically
> > > > creates a VertexType). Keeping the schema exactly in-sync with such
> > > > operations may introduce too much overhead for many purposes. We should
> > > > provide appropriate hooks for providers to implement such behaviour if
> > > > desired, or to help providers aggregate changes and perform incremental
> > > > batch updates to the schema.
> > > >
> > > > ### Transactions
> > > >
> > > > The schema graph will need to be transactional if the data
> > > >
> > > > ### File IO
> > > >
> > > > It is often useful to persist and load schemas to/from files. This
> > > > capability should be build into the GraphSchema class via simple
> > store()
> > > > and load() methods, using a custom compact JSON representation of the
> > > > schema. The specifics of this format are deferred to later discussion.
> > > >
> > > > GraphSchema exposes file IO directly:
> > > > - store(OutputStream): serialize the schema to a compact JSON
> > > > representation
> > > > - load(InputStream): deserialize a schema from JSON and merge it into
> > the
> > > > current schema graph
> > > >
> > > > Schema file IO should be implemented across all GLVs.
> > > >
> > > > ## Reference Implementation
> > > >
> > > > TinkerGraph serves as the reference implementation:
> > > >
> > > > - TinkerGraphSchema extends TinkerGraph implements GraphSchema
> > > > - TinkerVertexType extends TinkerVertex implements VertexType
> > > > - TinkerPropertyType extends TinkerVertex implements PropertyType
> > > > - TinkerEdgeType extends TinkerVertex implements EdgeType
> > > > - Recursion guard prevents schema-of-schema (TinkerGraphSchema
> > overrides
> > > > initSchema())
> > > >
> > > >
> > > > Please let me know any thoughts you may have on the approach. I intend
> > to
> > > > move this into a proposal PR soon, unless there are any major
> > disagreements
> > > > over the design.
> > > >
> > > > Thanks,
> > > > Cole
> > > >
> > >
> >
>

Re: [DISCUSS] Graph Schema Interfaces for TP

Reply via email to