Hi Josh, I like the idea of parameterized types. That sounds great if providers are free to plug in a rich type system if they choose, or to drop in a simple enum if they don't want added complexity.
Thanks, Cole On 2026/04/11 17:04:07 Joshua Shinavier wrote: > Hi Cole, > > I agree with keeping things as simple as possible. The schema types I > shared are parameterized by a *T* type for ids and property values. This > keeps the model flexible: providers can plug in whatever they need for T -- > whether something as complex as Hydra's Type type (which does include > complex record and union types, among other things), or as simple as a > provider specific enum with STRING, INT, LONG, etc. Now, in TinkerPop we > could keep the types parameterized (which is what I recommend), or we could > replace T with a TinkerPop-specific type system; like an enum for > DatatypeFeatures > <https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/ext/org/apache/tinkerpop/features/DataTypeFeatures.html> > (which > is based on Gremlin's Graph.Features > <https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/Graph.Features.html> > interface). Does that address the concern? > > Josh > > > > On Fri, Apr 10, 2026 at 2:52 PM Cole Greer <[email protected]> wrote: > > > Hi Josh, > > > > I agree that the property graph model from Hydra should adapt well for our > > purposes in TinkerPop. My main concern regarding schema types is that > > providers aren't unnecessarily burdened when trying to map types from an > > existing proprietary schema into our new interfaces. I like the idea of our > > schema types supporting complex composite and union types, however my > > primary goal is to keep this as simple for providers to adopt as possible. > > > > Thanks, > > Cole > > > > On 2026/04/03 03:31:12 Joshua Shinavier wrote: > > > Hi Cole. This looks good to me. With respect to the schema types, would > > you > > > please review the property graph model I showed a couple of meetings ago > > -- > > > here > > > < > > https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/package-summary.html > > > > > > -- > > > and let me know if you have any feedback. The types of interest start > > with > > > GraphSchema > > > < > > https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/GraphSchema.html > > >. > > > Ignore Hydra-specific details like PersistentMap/ConsList/Name, etc. If > > the > > > structure is agreeable, I will map the types into a format suitable for > > use > > > in your PR, in an org.apache.tinkerpop namespace. There is an associated > > > JSON format for interchange of GraphSchema and any other type we define > > in > > > this way. > > > > > > Josh > > > > > > > > > > > > On Thu, Apr 2, 2026 at 7:17 PM Cole Greer via dev < > > [email protected]> > > > wrote: > > > > > > > Hi Everyone, > > > > > > > > The topic of Graph Schema has been discussed extensively in recent > > > > TInkerPop Gatherings, and the following proposal has emerged from these > > > > gatherings. I believe it is now ready for broad consideration and > > > > discussions. I’ve done my best to incorporate initial feedback from > > Josh, > > > > Pieter, Valentyn, Stephen, Kris and others into this proposal, however > > I > > > > won’t claim that it accurately represents the views of anyone other > > than > > > > myself at this time. This is a broad topic and I’m deliberately > > excluding > > > > critical topics to focus this thread on standardizing interfaces for > > > > gremlin users and providers to interact with schema (see assumptions > > for > > > > more details). > > > > > > > > ## Overview > > > > > > > > This proposal introduces graph schema interfaces for TinkerPop: a way > > to > > > > define vertex types, edge types, and property types as a meta-graph > > that is > > > > itself traversable with Gremlin. The schema describes the structure of > > a > > > > data graph; what kinds of vertices and edges exist, what properties > > they > > > > carry, and how they connect.. > > > > > > > > ## Assumptions > > > > > > > > - Type keys are element labels: there is a 1-to-1 mapping between a > > label > > > > and a type definition. A vertex labeled "person" corresponds to > > exactly one > > > > VertexType, and an edge labeled "knows" corresponds to exactly one > > EdgeType. > > > > - Java classes are used as a type system: This proposal uses Java > > classes > > > > to define property type constraints. This is intended as a placeholder > > to > > > > be replaced by a proper type system to be defined via a later > > discussion. > > > > - This proposal makes very little consideration of if/when/where/how > > > > validation and enforcement of schema takes place. I believe it is > > important > > > > for us to ship something which is flexible and useful to providers out > > of > > > > the box as well as leaving space for providers to plugin existing > > > > implementations or build their own if they desire. I’ve left this out > > of > > > > scope for this proposal to focus first on interfaces which give > > providers > > > > the appropriate access to schema. > > > > > > > > ## Design Points > > > > > > > > ### 1. Schema-as-Graph > > > > > > > > `GraphSchema extends Graph`. Providers implement a familiar interface, > > and > > > > users traverse the schema with schema.traversal(). This avoids > > inventing a > > > > parallel API surface. The schema is just another graph. > > > > > > > > A data graph exposes its schema via Graph.schema(), which returns the > > > > GraphSchema instance. Providers that don't support schema return > > > > UnsupportedOperationException by default. > > > > > > > > ### 2. All type definitions are vertices > > > > > > > > VertexType, EdgeType, and PropertyType are all vertices in the schema > > > > meta-graph. > > > > > > > > - A VertexType vertex represents a vertex label definition (e.g. > > "person", > > > > "software"). > > > > - An EdgeType vertex represents an edge label definition (e.g. "knows", > > > > "created"). Even though it describes edges in the data graph, it is > > itself > > > > a vertex in the schema graph, connected to its endpoint VertexType > > vertices > > > > via from/to edges. > > > > - A PropertyType vertex represents a property on a type, connected to > > its > > > > parent type vertex via a “hasProperty" edge. > > > > > > > > Property definitions are independent per type, no sharing across types. > > > > > > > > Schema graph example for the classic TinkerPop modern graph: > > > > ``` > > > > (person:vertexType) --hasProperty--> (name:propertyType) > > > > (person:vertexType) --hasProperty--> (age:propertyType) > > > > (software:vertexType) --hasProperty--> (name:propertyType) > > > > (software:vertexType) --hasProperty--> (lang:propertyType) > > > > (knows:edgeType) --from--> (person:vertexType) > > > > (knows:edgeType) --to--> (person:vertexType) > > > > (knows:edgeType) --hasProperty--> (weight:propertyType) > > > > (created:edgeType) --from--> (person:vertexType) > > > > (created:edgeType) --to--> (software:vertexType) > > > > (created:edgeType) --hasProperty--> (weight:propertyType) > > > > ``` > > > > > > > > ### 3. Constraints are properties on type vertices > > > > > > > > Rather than a fixed constraint taxonomy, constraints are regular > > > > properties on type vertices, keyed by string via constraint(key, > > value). > > > > This keeps the model extensible such that providers can define their > > own > > > > constraints without changes to the core API. > > > > > > > > Constraints can be added to VertexType, EdgeType, and PropertyType > > > > vertices directly. The most common constraints such as property types > > and > > > > required properties would apply to PropertyTypes, while edge > > multiplicity > > > > constraints (e.g. one-to-many, one-to-one) are naturally expressed as > > > > constraints on the EdgeType itself rather than on any property. > > > > > > > > While constraint keys are arbitrary strings and providers are free to > > > > implement any constraints they like, TinkerPop should standardize a > > set of > > > > core constraint keys representing the most common constraints. Examples > > > > include “type", “required", “unique", “minValue", “maxValue", etc. > > > > Providers that support equivalent constraints are encouraged to follow > > > > these conventional names for interoperability. > > > > > > > > Non-core constraints (custom to a provider) are encouraged to follow a > > > > namespaced key convention to avoid collisions, e.g. > > "tinkergraph:notNull". > > > > Core constraint keys are unnamespaced. > > > > > > > > ### 4. Schema traversal steps in core Gremlin > > > > > > > > New steps for schema manipulation live directly in > > > > GraphTraversal/GraphTraversalSource, not in a separate DSL: > > > > > > > > - addVType(label) — creates a VertexType vertex > > > > - addEType(label) — creates an EdgeType vertex > > > > - propertyType(name) — creates a PropertyType vertex and connects it > > via > > > > hasProperty > > > > - constraint(key, value) — adds a constraint property to the current > > type > > > > vertex > > > > > > > > Example: defining a vertex type with properties: > > > > ``` > > > > schema.traversal().addVType("person") > > > > .propertyType("name").constraint("type", > > > > String.class).constraint("required", true).constraint("unique", true) > > > > .propertyType("age").constraint("type", Integer.class) > > > > ``` > > > > > > > > Example: defining an edge type with endpoint types and a property: > > > > ``` > > > > schema.traversal().addEType("knows") > > > > .from("person").to("person") > > > > .propertyType("weight").constraint("type", Double.class) > > > > ``` > > > > > > > > This mirrors the addE().from().to() pattern from the data-graph. Here > > > > from() and to() take vertex type labels (strings) and create from/to > > edges > > > > in the schema graph connecting the EdgeType to the referenced > > VertexType > > > > vertices. > > > > > > > > ### 5. Convenience methods for direct access > > > > > > > > The schema-as-graph model is the source of truth, but traversing it for > > > > simple lookups isn’t always convenient. Direct methods provide compact > > > > access: > > > > > > > > GraphSchema methods: > > > > - vertexTypes() → Collection<VertexType> > > > > - vertexType(String label) → Optional<VertexType> > > > > - edgeTypes() → Collection<EdgeType> > > > > - edgeType(String label) → Optional<EdgeType> > > > > - addVertexType(String label) → VertexType > > > > - addEdgeType(String label) → EdgeType > > > > - store(OutputStream): serialize the schema to a compact JSON > > > > representation > > > > - load(InputStream): deserialize and merge a schema from JSON into this > > > > schema graph > > > > > > > > EdgeType methods: > > > > - fromVertexTypes() → Collection<VertexType> > > > > - toVertexTypes() → Collection<VertexType> > > > > > > > > Example: > > > > ``` > > > > GraphSchema schema = graph.schema(); > > > > > > > > // Look up a vertex type > > > > VertexType person = schema.vertexType("person").orElseThrow(); > > > > > > > > // Inspect its properties > > > > for (PropertyType pd : person.propertyTypes()) { > > > > System.out.println(pd.name() + " : " + pd.constraint("type")); > > > > } > > > > > > > > // Look up an edge type and its connectivity > > > > EdgeType knows = schema.edgeType("knows").orElseThrow(); > > > > Collection<VertexType> fromTypes = knows.fromVertexTypes(); > > > > Collection<VertexType> toTypes = knows.toVertexTypes(); > > > > ``` > > > > > > > > ### 6. Cross-graph jumps > > > > > > > > Two steps bridge the data graph and schema graph: > > > > > > > > - type(): from a data traversal, jump to the element's type definition > > in > > > > the schema graph. > > > > - instances(): from a schema traversal, jump to all matching elements > > in > > > > the data graph. > > > > > > > > These compose for round-trip traversals: > > > > ``` > > > > // Get the type definition for "person" vertices > > > > g.V().hasLabel("person").type() > > > > > > > > // Get all instances of a schema type > > > > schema.traversal().vertexType("person").instances() > > > > > > > > // Round-trip: find marko's type, then get all instances of that type > > > > g.V().has("person", "name", "marko").type().instances() > > > > ``` > > > > > > > > ### 7. Schema restriction strategy > > > > > > > > There are some steps we will want to restrict in both the data graph > > and > > > > the schema-graph. addVType() wouldn’t make sense in the data-graph, nor > > > > would addV() be sensible in the schema-graph. A TraversalStrategy can > > > > restrict schema traversals to a safe subset of Gremlin steps > > > > (allowlist-based). This prevents accidentally running data element > > > > insertions, OLAP computations, complex control flow, or side-effect > > steps > > > > against the schema graph. The strategy should be auto-registered when > > > > traversing a GraphSchema instance. > > > > > > > > The exact allowlist should be a topic for later discussion. > > > > > > > > ### 8. Instance counts on type vertices > > > > > > > > VertexType.instanceCount() and EdgeType.instanceCount() return the > > count > > > > of data graph elements matching each type. This is a method rather > > than a > > > > property on the type vertex, keeping the schema graph definitional (not > > > > statistical) and giving providers full implementation flexibility. > > > > > > > > Approximate counts are likely acceptable and preferable for > > performance in > > > > most cases. However, TinkerPop should not stand in the way of providers > > > > that prefer exact counts, and should ensure that appropriate hooks are > > in > > > > place in reference implementations so that providers can maintain exact > > > > counts if they so desire. > > > > > > > > Transactional implications need additional consideration. Maintaining > > > > accurate counts across concurrent writes, rollbacks, and transaction > > > > isolation levels adds significant complexity. This interacts with the > > > > broader schema transactions question (see transactions below) and > > should be > > > > addressed alongside it. > > > > > > > > ### 9. GLV Support > > > > > > > > Each GLV (Python, JavaScript, .NET, Go) needs: > > > > > > > > - Schema data classes: Parallel classes to the 4 core Java interfaces, > > > > following the same pattern as existing Vertex and Edge classes. These > > are > > > > data containers representing schema objects returned from the server: > > > > - GraphSchema: holds collections of VertexTypes and EdgeTypes > > > > - VertexType: label, full constraints map, and collection of > > > > PropertyTypes > > > > - EdgeType: label, full constraints map, from/to VertexType > > references > > > > (same pattern as Edge.outV/Edge.inV), and collection of PropertyTypes > > > > - PropertyType: name and full constraints map (including data type > > as a > > > > constraint) > > > > - All new gremlin steps are supported from each GLV > > > > > > > > ## Future Questions > > > > > > > > ### Schema validation > > > > > > > > Providers will need lots of flexibility regarding validation modes. > > Some > > > > providers may choose to have write-time validation for all inserts, > > others > > > > may choose validate an entire graph against a schema as a batch job, > > while > > > > others may choose to validate on-commit. For our purposes, we need to > > > > provide a viable reference implementation, as well as ensuring > > sufficient > > > > extension points exist for providers to fulfill their needs. > > > > > > > > ### Dynamic schema updates from data writes > > > > > > > > It would be useful to auto-update the schema graph when data writes > > > > introduce new labels or properties (e.g. addV("newLabel”) automatically > > > > creates a VertexType). Keeping the schema exactly in-sync with such > > > > operations may introduce too much overhead for many purposes. We should > > > > provide appropriate hooks for providers to implement such behaviour if > > > > desired, or to help providers aggregate changes and perform incremental > > > > batch updates to the schema. > > > > > > > > ### Transactions > > > > > > > > The schema graph will need to be transactional if the data > > > > > > > > ### File IO > > > > > > > > It is often useful to persist and load schemas to/from files. This > > > > capability should be build into the GraphSchema class via simple > > store() > > > > and load() methods, using a custom compact JSON representation of the > > > > schema. The specifics of this format are deferred to later discussion. > > > > > > > > GraphSchema exposes file IO directly: > > > > - store(OutputStream): serialize the schema to a compact JSON > > > > representation > > > > - load(InputStream): deserialize a schema from JSON and merge it into > > the > > > > current schema graph > > > > > > > > Schema file IO should be implemented across all GLVs. > > > > > > > > ## Reference Implementation > > > > > > > > TinkerGraph serves as the reference implementation: > > > > > > > > - TinkerGraphSchema extends TinkerGraph implements GraphSchema > > > > - TinkerVertexType extends TinkerVertex implements VertexType > > > > - TinkerPropertyType extends TinkerVertex implements PropertyType > > > > - TinkerEdgeType extends TinkerVertex implements EdgeType > > > > - Recursion guard prevents schema-of-schema (TinkerGraphSchema > > overrides > > > > initSchema()) > > > > > > > > > > > > Please let me know any thoughts you may have on the approach. I intend > > to > > > > move this into a proposal PR soon, unless there are any major > > disagreements > > > > over the design. > > > > > > > > Thanks, > > > > Cole > > > > > > > > > >
