Re: [Discuss] Type system in TinkerPop

Joshua Shinavier Sun, 28 Jan 2024 09:40:30 -0800

Hi Valentin,

I agree with the sentiment, and I have a solution you might be interested
in. You might be able to grok the property graph validation test cases here
<https://github.com/CategoricalData/hydra/blob/main/hydra-java/src/test/java/hydra/langs/tinkerpop/ValidationTest.java>.
If you have ever heard me talking about algebraic property graphs (paper
<https://arxiv.org/abs/1909.04881>), this is a special case of that type
system, implemented for the JVM using Hydra's LambdaGraph data model. Also
check out the typed property graph model here
<https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Sources/Tier4/Langs/Tinkerpop/PropertyGraph.hs>;
I think you can see past the unfamiliar syntax to understand the notion of
property graph data / type conformance which is enforced:

   - A *property* is a string-valued key together with a property value.
   Properties are validated against *property types*, which are
   string-valued keys together with some primitive data type. Primitive data
   types and values are parameterized so that different applications can use
   their own. Property types also have a built-in optionality or requiredness
   parameter.
   - A *vertex* has a string-valued label, an id and a key/value map of
   property keys to values. Vertices are validated against *vertex types*,
   which mirror the structure of a vertex: a vertex type has a label, an id
   type, and a list of property types.
   - An *edge* has a string-valued label, an id, an out-vertex label, an
   in-vertex label, and a map of property keys to values. Edges are validated
   against *edge types*, which mirror the structure of an edge: an edge
   type has a label, an id type, an out-vertex label, an in-vertex label, and
   a list of property types.
   - A *graph* is a map of ids to vertices, together with a map of ids to
   edges (i.e. vertex ids and edge ids are unique in the graph. The latter
   constraint is relaxed for graphs which do not care about edge ids). Graphs
   are validated against *graph schemas*, which map vertex labels to vertex
   types, and edge labels to edge types.

As I mentioned, this approach is agnostic to the actual set of primitive
values and types which an application uses. In my team's work at LinkedIn
and Microsoft, we have been using Hydra's built-in Literal
<https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Sources/Core.hs#L202>
and LiteralType
<https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Sources/Core.hs#L217>.
No integration with Gremlin yet. For Gremlin, it would be worthwhile to
standardize on a set of primitive types which is well aligned with the JVM
types we use in practice. I believe there was a thread about this, with an
associated proposal, a year or three ago -- I can't find it at the moment.
If anyone remembers, please post a link. A couple of other recent threads
on types for TinkerPop are this one
<https://lists.apache.org/thread/jgm39jyof9zosohmwn2kpg8md2vk7g73> and this
one <https://lists.apache.org/thread/fh1vjdv7m8wbox0obfd3p4t05k3c9zwc>.

Best,

Josh

On Tue, Jan 23, 2024 at 3:26 PM Valentin Kagamlyk <
[email protected]> wrote:

> Hi all,
>
> Now in embedded graph technically possible to use any JVM type, but for
> network transmission this set is more limited. Also there are some
> differences in GLV's, for example number handling in Python and Javascript.
>
> There are 2 main categories of Gremlin types:
> - some needed to transfer data over the wire, for example Graph Elements,
> lots of enums and other utility types like ByteCode and Bindings.
> - types designed to store data as values in a graph, like labels, property
> values, etc. This includes most simple types like numbers, strings, dates,
> and some composite types like collections (List, Set, Map).
> Some types like Integer used everywhere.
>
> Restricting values to a limited set of allowed types helps to decouple
> gremlin from its JVM foundation, and allows us to ensure that value types
> are handled consistently across all gremlin language variants.
>
> I would like to discuss whether it makes sense to put such restrictions on
> element property values?
> If yes, what types to allow to be used as element property values?
>

Re: [Discuss] Type system in TinkerPop

Reply via email to