Hi all,
I'd like to propose expanding label support in TinkerPop 4.0 to allow multiple
mutable labels on vertices and edges.
Background:
When TinkerPop was first developed, the property graph model did not widely
support anything other than a single immutable label. Since then, multi-label
has become more common; mutable labels and no-label options are increasingly
supported across graph databases. The GQL standard defines both vertices and
edges as having zero or more labels.
Supporting multi-label and mutable labels fits relatively well into Gremlin
syntax and semantics. The notion of "no label" is more nuanced. It makes sense
for analytics use cases where algorithms don't care about element
classification, but less so for transactional cases where labels serve as
schema anchors. Rather than dictating one behavior, providers would be free to
configure the extent to which they wish to support multilabel, if any.
Proposal:
The goal is to make multi-label opt-in for providers, with configuration over
which label cardinalities and element types to support. Providers that wish to
remain single-label can do so without breaking changes.
For TinkerPop's reference implementation, I propose:
Vertices: 0..N label support (with 0..1, 1..1 and 1..N as configurable options)
Edges: 0..1 label support initially, with the foundation in place for N labels
later
Key structural changes:
- Element.label() deprecated in favor of Set<String> labels(). The label()
method returns the first label for backward compatibility. The default labels()
implementation in Element delegates to Collections.singleton(label()), so
existing providers work without changes.
- Serialization uses List<String> for all label fields in GraphBinary V4 and
GraphSON V4. The wire format is already list-based, this change populates the
list fully rather than always writing a singleton.
New steps:
- addLabel(String, String...), dropLabel(String, String...), dropLabels(), and
labels() for streaming all labels.
- with('multilabel') configuration for valueMap()/elementMap() where
single-label return remains the default, with multi-label output when
configured.
While multiple labels add some complexity to TinkerPop's model, this opens the
door for providers who want to expand their database models and moves toward
interoperability with GQL's label semantics.
I plan to draft a PR soon with a design proposal and initial implementation.
The goal is to include some level of multi-label support for the upcoming beta
release, setting a good foundation for 4.0.0 GA feedback.
Please share any thoughts, concerns, or questions in this thread.
Thank you,
Yang