Re: [DISCUSS] Graph Schema Interfaces for TP

Joshua Shinavier Sat, 11 Apr 2026 10:04:35 -0700

Hi Cole,

I agree with keeping things as simple as possible. The schema types I
shared are parameterized by a *T* type for ids and property values. This
keeps the model flexible: providers can plug in whatever they need for T --
whether something as complex as Hydra's Type type (which does include
complex record and union types, among other things), or as simple as a
provider specific enum with STRING, INT, LONG, etc. Now, in TinkerPop we
could keep the types parameterized (which is what I recommend), or we could
replace T with a TinkerPop-specific type system; like an enum for
DatatypeFeatures
<https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/ext/org/apache/tinkerpop/features/DataTypeFeatures.html>
(which
is based on Gremlin's Graph.Features
<https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/Graph.Features.html>
interface). Does that address the concern?


Josh



On Fri, Apr 10, 2026 at 2:52 PM Cole Greer <[email protected]> wrote:

> Hi Josh,
>
> I agree that the property graph model from Hydra should adapt well for our
> purposes in TinkerPop. My main concern regarding schema types is that
> providers aren't unnecessarily burdened when trying to map types from an
> existing proprietary schema into our new interfaces. I like the idea of our
> schema types supporting complex composite and union types, however my
> primary goal is to keep this as simple for providers to adopt as possible.
>
> Thanks,
> Cole
>
> On 2026/04/03 03:31:12 Joshua Shinavier wrote:
> > Hi Cole. This looks good to me. With respect to the schema types, would
> you
> > please review the property graph model I showed a couple of meetings ago
> --
> > here
> > <
> https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/package-summary.html
> >
> > --
> > and let me know if you have any feedback. The types of interest start
> with
> > GraphSchema
> > <
> https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/GraphSchema.html
> >.
> > Ignore Hydra-specific details like PersistentMap/ConsList/Name, etc. If
> the
> > structure is agreeable, I will map the types into a format suitable for
> use
> > in your PR, in an org.apache.tinkerpop namespace. There is an associated
> > JSON format for interchange of GraphSchema and any other type we define
> in
> > this way.
> >
> > Josh
> >
> >
> >
> > On Thu, Apr 2, 2026 at 7:17 PM Cole Greer via dev <
> [email protected]>
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > The topic of Graph Schema has been discussed extensively in recent
> > > TInkerPop Gatherings, and the following proposal has emerged from these
> > > gatherings. I believe it is now ready for broad consideration and
> > > discussions. I’ve done my best to incorporate initial feedback from
> Josh,
> > > Pieter, Valentyn, Stephen, Kris and others into this proposal, however
> I
> > > won’t claim that it accurately represents the views of anyone other
> than
> > > myself at this time. This is a broad topic and I’m deliberately
> excluding
> > > critical topics to focus this thread on standardizing interfaces for
> > > gremlin users and providers to interact with schema (see assumptions
> for
> > > more details).
> > >
> > > ## Overview
> > >
> > > This proposal introduces graph schema interfaces for TinkerPop: a way
> to
> > > define vertex types, edge types, and property types as a meta-graph
> that is
> > > itself traversable with Gremlin. The schema describes the structure of
> a
> > > data graph; what kinds of vertices and edges exist, what properties
> they
> > > carry, and how they connect..
> > >
> > > ## Assumptions
> > >
> > > - Type keys are element labels: there is a 1-to-1 mapping between a
> label
> > > and a type definition. A vertex labeled "person" corresponds to
> exactly one
> > > VertexType, and an edge labeled "knows" corresponds to exactly one
> EdgeType.
> > > - Java classes are used as a type system: This proposal uses Java
> classes
> > > to define property type constraints. This is intended as a placeholder
> to
> > > be replaced by a proper type system to be defined via a later
> discussion.
> > > - This proposal makes very little consideration of if/when/where/how
> > > validation and enforcement of schema takes place. I believe it is
> important
> > > for us to ship something which is flexible and useful to providers out
> of
> > > the box as well as leaving space for providers to plugin existing
> > > implementations or build their own if they desire. I’ve left this out
> of
> > > scope for this proposal to focus first on interfaces which give
> providers
> > > the appropriate access to schema.
> > >
> > > ## Design Points
> > >
> > > ### 1. Schema-as-Graph
> > >
> > > `GraphSchema extends Graph`. Providers implement a familiar interface,
> and
> > > users traverse the schema with schema.traversal(). This avoids
> inventing a
> > > parallel API surface. The schema is just another graph.
> > >
> > > A data graph exposes its schema via Graph.schema(), which returns the
> > > GraphSchema instance. Providers that don't support schema return
> > > UnsupportedOperationException by default.
> > >
> > > ### 2. All type definitions are vertices
> > >
> > > VertexType, EdgeType, and PropertyType are all vertices in the schema
> > > meta-graph.
> > >
> > > - A VertexType vertex represents a vertex label definition (e.g.
> "person",
> > > "software").
> > > - An EdgeType vertex represents an edge label definition (e.g. "knows",
> > > "created"). Even though it describes edges in the data graph, it is
> itself
> > > a vertex in the schema graph, connected to its endpoint VertexType
> vertices
> > > via from/to edges.
> > > - A PropertyType vertex represents a property on a type, connected to
> its
> > > parent type vertex via a “hasProperty" edge.
> > >
> > > Property definitions are independent per type, no sharing across types.
> > >
> > > Schema graph example for the classic TinkerPop modern graph:
> > > ```
> > > (person:vertexType) --hasProperty--> (name:propertyType)
> > > (person:vertexType) --hasProperty--> (age:propertyType)
> > > (software:vertexType) --hasProperty--> (name:propertyType)
> > > (software:vertexType) --hasProperty--> (lang:propertyType)
> > > (knows:edgeType) --from--> (person:vertexType)
> > > (knows:edgeType) --to-->   (person:vertexType)
> > > (knows:edgeType) --hasProperty--> (weight:propertyType)
> > > (created:edgeType) --from--> (person:vertexType)
> > > (created:edgeType) --to-->   (software:vertexType)
> > > (created:edgeType) --hasProperty--> (weight:propertyType)
> > > ```
> > >
> > > ### 3. Constraints are properties on type vertices
> > >
> > > Rather than a fixed constraint taxonomy, constraints are regular
> > > properties on type vertices, keyed by string via constraint(key,
> value).
> > > This keeps the model extensible such that providers can define their
> own
> > > constraints without changes to the core API.
> > >
> > > Constraints can be added to VertexType, EdgeType, and PropertyType
> > > vertices directly. The most common constraints such as property types
> and
> > > required properties would apply to PropertyTypes, while edge
> multiplicity
> > > constraints (e.g. one-to-many, one-to-one) are naturally expressed as
> > > constraints on the EdgeType itself rather than on any property.
> > >
> > > While constraint keys are arbitrary strings and providers are free to
> > > implement any constraints they like, TinkerPop should standardize a
> set of
> > > core constraint keys representing the most common constraints. Examples
> > > include “type", “required", “unique", “minValue", “maxValue", etc.
> > > Providers that support equivalent constraints are encouraged to follow
> > > these conventional names for interoperability.
> > >
> > > Non-core constraints (custom to a provider) are encouraged to follow a
> > > namespaced key convention to avoid collisions, e.g.
> "tinkergraph:notNull".
> > > Core constraint keys are unnamespaced.
> > >
> > > ### 4. Schema traversal steps in core Gremlin
> > >
> > > New steps for schema manipulation live directly in
> > > GraphTraversal/GraphTraversalSource, not in a separate DSL:
> > >
> > > - addVType(label) — creates a VertexType vertex
> > > - addEType(label) — creates an EdgeType vertex
> > > - propertyType(name) — creates a PropertyType vertex and connects it
> via
> > > hasProperty
> > > - constraint(key, value) — adds a constraint property to the current
> type
> > > vertex
> > >
> > > Example: defining a vertex type with properties:
> > > ```
> > > schema.traversal().addVType("person")
> > >     .propertyType("name").constraint("type",
> > > String.class).constraint("required", true).constraint("unique", true)
> > >     .propertyType("age").constraint("type", Integer.class)
> > > ```
> > >
> > > Example: defining an edge type with endpoint types and a property:
> > > ```
> > > schema.traversal().addEType("knows")
> > >     .from("person").to("person")
> > >     .propertyType("weight").constraint("type", Double.class)
> > > ```
> > >
> > > This mirrors the addE().from().to() pattern from the data-graph. Here
> > > from() and to() take vertex type labels (strings) and create from/to
> edges
> > > in the schema graph connecting the EdgeType to the referenced
> VertexType
> > > vertices.
> > >
> > > ### 5. Convenience methods for direct access
> > >
> > > The schema-as-graph model is the source of truth, but traversing it for
> > > simple lookups isn’t always convenient. Direct methods provide compact
> > > access:
> > >
> > > GraphSchema methods:
> > > - vertexTypes() → Collection<VertexType>
> > > - vertexType(String label) → Optional<VertexType>
> > > - edgeTypes() → Collection<EdgeType>
> > > - edgeType(String label) → Optional<EdgeType>
> > > - addVertexType(String label) → VertexType
> > > - addEdgeType(String label) → EdgeType
> > > - store(OutputStream):  serialize the schema to a compact JSON
> > > representation
> > > - load(InputStream): deserialize and merge a schema from JSON into this
> > > schema graph
> > >
> > > EdgeType methods:
> > > - fromVertexTypes() → Collection<VertexType>
> > > - toVertexTypes() → Collection<VertexType>
> > >
> > > Example:
> > > ```
> > > GraphSchema schema = graph.schema();
> > >
> > > // Look up a vertex type
> > > VertexType person = schema.vertexType("person").orElseThrow();
> > >
> > > // Inspect its properties
> > > for (PropertyType pd : person.propertyTypes()) {
> > >     System.out.println(pd.name() + " : " + pd.constraint("type"));
> > > }
> > >
> > > // Look up an edge type and its connectivity
> > > EdgeType knows = schema.edgeType("knows").orElseThrow();
> > > Collection<VertexType> fromTypes = knows.fromVertexTypes();
> > > Collection<VertexType> toTypes = knows.toVertexTypes();
> > > ```
> > >
> > > ### 6. Cross-graph jumps
> > >
> > > Two steps bridge the data graph and schema graph:
> > >
> > > - type(): from a data traversal, jump to the element's type definition
> in
> > > the schema graph.
> > > - instances(): from a schema traversal, jump to all matching elements
> in
> > > the data graph.
> > >
> > > These compose for round-trip traversals:
> > > ```
> > > // Get the type definition for "person" vertices
> > > g.V().hasLabel("person").type()
> > >
> > > // Get all instances of a schema type
> > > schema.traversal().vertexType("person").instances()
> > >
> > > // Round-trip: find marko's type, then get all instances of that type
> > > g.V().has("person", "name", "marko").type().instances()
> > > ```
> > >
> > > ### 7. Schema restriction strategy
> > >
> > > There are some steps we will want to restrict in both the data graph
> and
> > > the schema-graph. addVType() wouldn’t make sense in the data-graph, nor
> > > would addV() be sensible in the schema-graph. A TraversalStrategy can
> > > restrict schema traversals to a safe subset of Gremlin steps
> > > (allowlist-based). This prevents accidentally running data element
> > > insertions, OLAP computations, complex control flow, or side-effect
> steps
> > > against the schema graph. The strategy should be auto-registered when
> > > traversing a GraphSchema instance.
> > >
> > > The exact allowlist should be a topic for later discussion.
> > >
> > > ### 8. Instance counts on type vertices
> > >
> > > VertexType.instanceCount() and EdgeType.instanceCount() return the
> count
> > > of data graph elements matching each type. This is a method rather
> than a
> > > property on the type vertex, keeping the schema graph definitional (not
> > > statistical) and giving providers full implementation flexibility.
> > >
> > > Approximate counts are likely acceptable and preferable for
> performance in
> > > most cases. However, TinkerPop should not stand in the way of providers
> > > that prefer exact counts, and should ensure that appropriate hooks are
> in
> > > place in reference implementations so that providers can maintain exact
> > > counts if they so desire.
> > >
> > > Transactional implications need additional consideration. Maintaining
> > > accurate counts across concurrent writes, rollbacks, and transaction
> > > isolation levels adds significant complexity. This interacts with the
> > > broader schema transactions question (see transactions below) and
> should be
> > > addressed alongside it.
> > >
> > > ### 9. GLV Support
> > >
> > > Each GLV (Python, JavaScript, .NET, Go) needs:
> > >
> > > - Schema data classes: Parallel classes to the 4 core Java interfaces,
> > > following the same pattern as existing Vertex and Edge classes. These
> are
> > > data containers representing schema objects returned from the server:
> > >   - GraphSchema: holds collections of VertexTypes and EdgeTypes
> > >   - VertexType: label, full constraints map, and collection of
> > > PropertyTypes
> > >   - EdgeType: label, full constraints map, from/to VertexType
> references
> > > (same pattern as Edge.outV/Edge.inV), and collection of PropertyTypes
> > >   - PropertyType: name and full constraints map (including data type
> as a
> > > constraint)
> > > - All new gremlin steps are supported from each GLV
> > >
> > > ## Future Questions
> > >
> > > ### Schema validation
> > >
> > > Providers will need lots of flexibility regarding validation modes.
> Some
> > > providers may choose to have write-time validation for all inserts,
> others
> > > may choose validate an entire graph against a schema as a batch job,
> while
> > > others may choose to validate on-commit. For our purposes, we need to
> > > provide a viable reference implementation, as well as ensuring
> sufficient
> > > extension points exist for providers to fulfill their needs.
> > >
> > > ### Dynamic schema updates from data writes
> > >
> > > It would be useful to auto-update the schema graph when data writes
> > > introduce new labels or properties (e.g. addV("newLabel”) automatically
> > > creates a VertexType). Keeping the schema exactly in-sync with such
> > > operations may introduce too much overhead for many purposes. We should
> > > provide appropriate hooks for providers to implement such behaviour if
> > > desired, or to help providers aggregate changes and perform incremental
> > > batch updates to the schema.
> > >
> > > ### Transactions
> > >
> > > The schema graph will need to be transactional if the data
> > >
> > > ### File IO
> > >
> > > It is often useful to persist and load schemas to/from files. This
> > > capability should be build into the GraphSchema class via simple
> store()
> > > and load() methods, using a custom compact JSON representation of the
> > > schema. The specifics of this format are deferred to later discussion.
> > >
> > > GraphSchema exposes file IO directly:
> > > - store(OutputStream): serialize the schema to a compact JSON
> > > representation
> > > - load(InputStream): deserialize a schema from JSON and merge it into
> the
> > > current schema graph
> > >
> > > Schema file IO should be implemented across all GLVs.
> > >
> > > ## Reference Implementation
> > >
> > > TinkerGraph serves as the reference implementation:
> > >
> > > - TinkerGraphSchema extends TinkerGraph implements GraphSchema
> > > - TinkerVertexType extends TinkerVertex implements VertexType
> > > - TinkerPropertyType extends TinkerVertex implements PropertyType
> > > - TinkerEdgeType extends TinkerVertex implements EdgeType
> > > - Recursion guard prevents schema-of-schema (TinkerGraphSchema
> overrides
> > > initSchema())
> > >
> > >
> > > Please let me know any thoughts you may have on the approach. I intend
> to
> > > move this into a proposal PR soon, unless there are any major
> disagreements
> > > over the design.
> > >
> > > Thanks,
> > > Cole
> > >
> >
>

Re: [DISCUSS] Graph Schema Interfaces for TP

Reply via email to