Hi Cole, I agree with keeping things as simple as possible. The schema types I shared are parameterized by a *T* type for ids and property values. This keeps the model flexible: providers can plug in whatever they need for T -- whether something as complex as Hydra's Type type (which does include complex record and union types, among other things), or as simple as a provider specific enum with STRING, INT, LONG, etc. Now, in TinkerPop we could keep the types parameterized (which is what I recommend), or we could replace T with a TinkerPop-specific type system; like an enum for DatatypeFeatures <https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/ext/org/apache/tinkerpop/features/DataTypeFeatures.html> (which is based on Gremlin's Graph.Features <https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/Graph.Features.html> interface). Does that address the concern?
Josh On Fri, Apr 10, 2026 at 2:52 PM Cole Greer <[email protected]> wrote: > Hi Josh, > > I agree that the property graph model from Hydra should adapt well for our > purposes in TinkerPop. My main concern regarding schema types is that > providers aren't unnecessarily burdened when trying to map types from an > existing proprietary schema into our new interfaces. I like the idea of our > schema types supporting complex composite and union types, however my > primary goal is to keep this as simple for providers to adopt as possible. > > Thanks, > Cole > > On 2026/04/03 03:31:12 Joshua Shinavier wrote: > > Hi Cole. This looks good to me. With respect to the schema types, would > you > > please review the property graph model I showed a couple of meetings ago > -- > > here > > < > https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/package-summary.html > > > > -- > > and let me know if you have any feedback. The types of interest start > with > > GraphSchema > > < > https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/GraphSchema.html > >. > > Ignore Hydra-specific details like PersistentMap/ConsList/Name, etc. If > the > > structure is agreeable, I will map the types into a format suitable for > use > > in your PR, in an org.apache.tinkerpop namespace. There is an associated > > JSON format for interchange of GraphSchema and any other type we define > in > > this way. > > > > Josh > > > > > > > > On Thu, Apr 2, 2026 at 7:17 PM Cole Greer via dev < > [email protected]> > > wrote: > > > > > Hi Everyone, > > > > > > The topic of Graph Schema has been discussed extensively in recent > > > TInkerPop Gatherings, and the following proposal has emerged from these > > > gatherings. I believe it is now ready for broad consideration and > > > discussions. I’ve done my best to incorporate initial feedback from > Josh, > > > Pieter, Valentyn, Stephen, Kris and others into this proposal, however > I > > > won’t claim that it accurately represents the views of anyone other > than > > > myself at this time. This is a broad topic and I’m deliberately > excluding > > > critical topics to focus this thread on standardizing interfaces for > > > gremlin users and providers to interact with schema (see assumptions > for > > > more details). > > > > > > ## Overview > > > > > > This proposal introduces graph schema interfaces for TinkerPop: a way > to > > > define vertex types, edge types, and property types as a meta-graph > that is > > > itself traversable with Gremlin. The schema describes the structure of > a > > > data graph; what kinds of vertices and edges exist, what properties > they > > > carry, and how they connect.. > > > > > > ## Assumptions > > > > > > - Type keys are element labels: there is a 1-to-1 mapping between a > label > > > and a type definition. A vertex labeled "person" corresponds to > exactly one > > > VertexType, and an edge labeled "knows" corresponds to exactly one > EdgeType. > > > - Java classes are used as a type system: This proposal uses Java > classes > > > to define property type constraints. This is intended as a placeholder > to > > > be replaced by a proper type system to be defined via a later > discussion. > > > - This proposal makes very little consideration of if/when/where/how > > > validation and enforcement of schema takes place. I believe it is > important > > > for us to ship something which is flexible and useful to providers out > of > > > the box as well as leaving space for providers to plugin existing > > > implementations or build their own if they desire. I’ve left this out > of > > > scope for this proposal to focus first on interfaces which give > providers > > > the appropriate access to schema. > > > > > > ## Design Points > > > > > > ### 1. Schema-as-Graph > > > > > > `GraphSchema extends Graph`. Providers implement a familiar interface, > and > > > users traverse the schema with schema.traversal(). This avoids > inventing a > > > parallel API surface. The schema is just another graph. > > > > > > A data graph exposes its schema via Graph.schema(), which returns the > > > GraphSchema instance. Providers that don't support schema return > > > UnsupportedOperationException by default. > > > > > > ### 2. All type definitions are vertices > > > > > > VertexType, EdgeType, and PropertyType are all vertices in the schema > > > meta-graph. > > > > > > - A VertexType vertex represents a vertex label definition (e.g. > "person", > > > "software"). > > > - An EdgeType vertex represents an edge label definition (e.g. "knows", > > > "created"). Even though it describes edges in the data graph, it is > itself > > > a vertex in the schema graph, connected to its endpoint VertexType > vertices > > > via from/to edges. > > > - A PropertyType vertex represents a property on a type, connected to > its > > > parent type vertex via a “hasProperty" edge. > > > > > > Property definitions are independent per type, no sharing across types. > > > > > > Schema graph example for the classic TinkerPop modern graph: > > > ``` > > > (person:vertexType) --hasProperty--> (name:propertyType) > > > (person:vertexType) --hasProperty--> (age:propertyType) > > > (software:vertexType) --hasProperty--> (name:propertyType) > > > (software:vertexType) --hasProperty--> (lang:propertyType) > > > (knows:edgeType) --from--> (person:vertexType) > > > (knows:edgeType) --to--> (person:vertexType) > > > (knows:edgeType) --hasProperty--> (weight:propertyType) > > > (created:edgeType) --from--> (person:vertexType) > > > (created:edgeType) --to--> (software:vertexType) > > > (created:edgeType) --hasProperty--> (weight:propertyType) > > > ``` > > > > > > ### 3. Constraints are properties on type vertices > > > > > > Rather than a fixed constraint taxonomy, constraints are regular > > > properties on type vertices, keyed by string via constraint(key, > value). > > > This keeps the model extensible such that providers can define their > own > > > constraints without changes to the core API. > > > > > > Constraints can be added to VertexType, EdgeType, and PropertyType > > > vertices directly. The most common constraints such as property types > and > > > required properties would apply to PropertyTypes, while edge > multiplicity > > > constraints (e.g. one-to-many, one-to-one) are naturally expressed as > > > constraints on the EdgeType itself rather than on any property. > > > > > > While constraint keys are arbitrary strings and providers are free to > > > implement any constraints they like, TinkerPop should standardize a > set of > > > core constraint keys representing the most common constraints. Examples > > > include “type", “required", “unique", “minValue", “maxValue", etc. > > > Providers that support equivalent constraints are encouraged to follow > > > these conventional names for interoperability. > > > > > > Non-core constraints (custom to a provider) are encouraged to follow a > > > namespaced key convention to avoid collisions, e.g. > "tinkergraph:notNull". > > > Core constraint keys are unnamespaced. > > > > > > ### 4. Schema traversal steps in core Gremlin > > > > > > New steps for schema manipulation live directly in > > > GraphTraversal/GraphTraversalSource, not in a separate DSL: > > > > > > - addVType(label) — creates a VertexType vertex > > > - addEType(label) — creates an EdgeType vertex > > > - propertyType(name) — creates a PropertyType vertex and connects it > via > > > hasProperty > > > - constraint(key, value) — adds a constraint property to the current > type > > > vertex > > > > > > Example: defining a vertex type with properties: > > > ``` > > > schema.traversal().addVType("person") > > > .propertyType("name").constraint("type", > > > String.class).constraint("required", true).constraint("unique", true) > > > .propertyType("age").constraint("type", Integer.class) > > > ``` > > > > > > Example: defining an edge type with endpoint types and a property: > > > ``` > > > schema.traversal().addEType("knows") > > > .from("person").to("person") > > > .propertyType("weight").constraint("type", Double.class) > > > ``` > > > > > > This mirrors the addE().from().to() pattern from the data-graph. Here > > > from() and to() take vertex type labels (strings) and create from/to > edges > > > in the schema graph connecting the EdgeType to the referenced > VertexType > > > vertices. > > > > > > ### 5. Convenience methods for direct access > > > > > > The schema-as-graph model is the source of truth, but traversing it for > > > simple lookups isn’t always convenient. Direct methods provide compact > > > access: > > > > > > GraphSchema methods: > > > - vertexTypes() → Collection<VertexType> > > > - vertexType(String label) → Optional<VertexType> > > > - edgeTypes() → Collection<EdgeType> > > > - edgeType(String label) → Optional<EdgeType> > > > - addVertexType(String label) → VertexType > > > - addEdgeType(String label) → EdgeType > > > - store(OutputStream): serialize the schema to a compact JSON > > > representation > > > - load(InputStream): deserialize and merge a schema from JSON into this > > > schema graph > > > > > > EdgeType methods: > > > - fromVertexTypes() → Collection<VertexType> > > > - toVertexTypes() → Collection<VertexType> > > > > > > Example: > > > ``` > > > GraphSchema schema = graph.schema(); > > > > > > // Look up a vertex type > > > VertexType person = schema.vertexType("person").orElseThrow(); > > > > > > // Inspect its properties > > > for (PropertyType pd : person.propertyTypes()) { > > > System.out.println(pd.name() + " : " + pd.constraint("type")); > > > } > > > > > > // Look up an edge type and its connectivity > > > EdgeType knows = schema.edgeType("knows").orElseThrow(); > > > Collection<VertexType> fromTypes = knows.fromVertexTypes(); > > > Collection<VertexType> toTypes = knows.toVertexTypes(); > > > ``` > > > > > > ### 6. Cross-graph jumps > > > > > > Two steps bridge the data graph and schema graph: > > > > > > - type(): from a data traversal, jump to the element's type definition > in > > > the schema graph. > > > - instances(): from a schema traversal, jump to all matching elements > in > > > the data graph. > > > > > > These compose for round-trip traversals: > > > ``` > > > // Get the type definition for "person" vertices > > > g.V().hasLabel("person").type() > > > > > > // Get all instances of a schema type > > > schema.traversal().vertexType("person").instances() > > > > > > // Round-trip: find marko's type, then get all instances of that type > > > g.V().has("person", "name", "marko").type().instances() > > > ``` > > > > > > ### 7. Schema restriction strategy > > > > > > There are some steps we will want to restrict in both the data graph > and > > > the schema-graph. addVType() wouldn’t make sense in the data-graph, nor > > > would addV() be sensible in the schema-graph. A TraversalStrategy can > > > restrict schema traversals to a safe subset of Gremlin steps > > > (allowlist-based). This prevents accidentally running data element > > > insertions, OLAP computations, complex control flow, or side-effect > steps > > > against the schema graph. The strategy should be auto-registered when > > > traversing a GraphSchema instance. > > > > > > The exact allowlist should be a topic for later discussion. > > > > > > ### 8. Instance counts on type vertices > > > > > > VertexType.instanceCount() and EdgeType.instanceCount() return the > count > > > of data graph elements matching each type. This is a method rather > than a > > > property on the type vertex, keeping the schema graph definitional (not > > > statistical) and giving providers full implementation flexibility. > > > > > > Approximate counts are likely acceptable and preferable for > performance in > > > most cases. However, TinkerPop should not stand in the way of providers > > > that prefer exact counts, and should ensure that appropriate hooks are > in > > > place in reference implementations so that providers can maintain exact > > > counts if they so desire. > > > > > > Transactional implications need additional consideration. Maintaining > > > accurate counts across concurrent writes, rollbacks, and transaction > > > isolation levels adds significant complexity. This interacts with the > > > broader schema transactions question (see transactions below) and > should be > > > addressed alongside it. > > > > > > ### 9. GLV Support > > > > > > Each GLV (Python, JavaScript, .NET, Go) needs: > > > > > > - Schema data classes: Parallel classes to the 4 core Java interfaces, > > > following the same pattern as existing Vertex and Edge classes. These > are > > > data containers representing schema objects returned from the server: > > > - GraphSchema: holds collections of VertexTypes and EdgeTypes > > > - VertexType: label, full constraints map, and collection of > > > PropertyTypes > > > - EdgeType: label, full constraints map, from/to VertexType > references > > > (same pattern as Edge.outV/Edge.inV), and collection of PropertyTypes > > > - PropertyType: name and full constraints map (including data type > as a > > > constraint) > > > - All new gremlin steps are supported from each GLV > > > > > > ## Future Questions > > > > > > ### Schema validation > > > > > > Providers will need lots of flexibility regarding validation modes. > Some > > > providers may choose to have write-time validation for all inserts, > others > > > may choose validate an entire graph against a schema as a batch job, > while > > > others may choose to validate on-commit. For our purposes, we need to > > > provide a viable reference implementation, as well as ensuring > sufficient > > > extension points exist for providers to fulfill their needs. > > > > > > ### Dynamic schema updates from data writes > > > > > > It would be useful to auto-update the schema graph when data writes > > > introduce new labels or properties (e.g. addV("newLabel”) automatically > > > creates a VertexType). Keeping the schema exactly in-sync with such > > > operations may introduce too much overhead for many purposes. We should > > > provide appropriate hooks for providers to implement such behaviour if > > > desired, or to help providers aggregate changes and perform incremental > > > batch updates to the schema. > > > > > > ### Transactions > > > > > > The schema graph will need to be transactional if the data > > > > > > ### File IO > > > > > > It is often useful to persist and load schemas to/from files. This > > > capability should be build into the GraphSchema class via simple > store() > > > and load() methods, using a custom compact JSON representation of the > > > schema. The specifics of this format are deferred to later discussion. > > > > > > GraphSchema exposes file IO directly: > > > - store(OutputStream): serialize the schema to a compact JSON > > > representation > > > - load(InputStream): deserialize a schema from JSON and merge it into > the > > > current schema graph > > > > > > Schema file IO should be implemented across all GLVs. > > > > > > ## Reference Implementation > > > > > > TinkerGraph serves as the reference implementation: > > > > > > - TinkerGraphSchema extends TinkerGraph implements GraphSchema > > > - TinkerVertexType extends TinkerVertex implements VertexType > > > - TinkerPropertyType extends TinkerVertex implements PropertyType > > > - TinkerEdgeType extends TinkerVertex implements EdgeType > > > - Recursion guard prevents schema-of-schema (TinkerGraphSchema > overrides > > > initSchema()) > > > > > > > > > Please let me know any thoughts you may have on the approach. I intend > to > > > move this into a proposal PR soon, unless there are any major > disagreements > > > over the design. > > > > > > Thanks, > > > Cole > > > > > >
