FYI, I have extended the example to use the Protobuf-like solution for domain-specific objects which I mentioned above. The overall flow of the example is this:
- The client creates and populates a TinkerGraph instance. One of the properties has the key "livesIn" and a value which is an instance of a domain-specific BoundingBox class. - The client encodes the graph to an instance of the Thrift-generated Graph class. The BoundingBox is serialized using a JSON-based encoder which has been added to an encoder registry that is shared between client and server. - The Thrift-generated code sends the encoded graph across the wire to the server, which receives it again as an instance of the Thrift-generated Graph class - The server decodes the graph to a new instance of TinkerGraph. The serialized bounding box is deserialized to an instance of the domain-specific BoundingBox class, and becomes a property value in the server's graph. - The server prints out some info and writes the received graph to disk as a GraphSON file so we can see that it is true to the client's original graph Note: I'm stretching the notion of "serialized" values somewhat in that, in these graphs, a serialized value is a record with two fields (or an object with two member variables): the encoded value itself (in this case, a JSON blob), and a type identifier. Josh On Wed, Jul 7, 2021 at 6:51 AM Joshua Shinavier <j...@fortytwo.net> wrote: > Hi Stephen, > > Good questions. Let's elevate this discussion (about the specifics of > graphs and traversal results over Thrift) to the dev list. See inline. > > > On Wed, Jul 7, 2021 at 5:08 AM Stephen Mallette <spmalle...@gmail.com> > wrote: > >> So, what happens if a returned Vertex contained a ByteBuffer or >> InetAddress as a property value? I assume the thrift definition has to be >> adjusted to include those types if you expect them in the results? >> > > > What you see in the diff, currently, captures the types specifically > mentioned in Graph.Features (see graph_features.yaml). In order to support > other types natively, we should update Graph.Features in parallel. Byte > arrays can be captured using Thrift's binary type. Domain-specific types > like InetAddress probably should not be built in, just as specific element > labels and property keys are not built in at this level. However, that is > not the only possible answer. Certain very common types like IP addresses, > dates and intervals, units of measurement, etc. *could* be built into the > type system, but IMO probably shouldn't. Instead, we should give users a > way of encoding and decoding domain-specific objects using a handful of > atomic types. InetAddress in this case is encoded either as a string or a > struct. > > > >> How would provider specific types (like a Point or special instances of P >> in JanusGraph) fit into something like this - how would providers (or >> users) extend on our thrift definitions? >> > > Point is definitely a domain-specific type which you would not see at this > level of schema. Maybe I can illustrate encoding and decoding > domain-specific types in the branch; using the current simple type system, > you could turn the Point into a map with three keys, like "latitude", > "longitude" and "type". When receiving a map with "type" equal to "Point", > you turn it back into a native Point object. We could also use a strategy > similar to Protobuf's Any type, where we send a struct with two fields over > the wire: one field provides the data of the Point, and the other field > provides a URL which specifies the type, i.e. how the object should be > decoded. It is probably worthwhile to add a "record" type variant to > Graph.Features in any case. > > > > I think that the idea of having a more strict definition on the types >> Gremlin supports is starting to materialize given the constraints on >> serializable types of GraphSON and then further restricted in GraphBinary. >> We actually have a list of types that haven't changed much in years at this >> point: >> >> https://tinkerpop.apache.org/docs/3.5.0/dev/io/ >> > > > We might want to go through this list with a fine-toothed comb (i.e. we > probably don't want both a Date atomic type and a Timestamp type unless > they have different precision/granularity, in which case I would make that > explicit in the name of the type, e.g. UnixTimeSeconds vs. UnixTimeMillis). > > > I think we could actually even limit them further and then the dream would >> be to prevent them from being so JVM specific. >> > > > Yes, I would argue for limiting them to very domain-independent atomic > types, probably excluding the timestamp type(s) as well as UUID and Class. > However, as I say it's possible to include a few specialized types if the > user demand is really high. It's just more stuff which needs to be > implemented in each Gremlin language variant. > > > >> It would be nice to elevate the discussion of supported types out of >> serialization and into the Gremlin language layer itself, which would then >> in turn drive serialization discussions. >> > > > That's where I see this going. The specification of Gremlin traversal > structure in YAML (already illustrated in the branch) translates neatly > into traversals over the wire using Thrift. To that and the basic graph > structure specification, we need a specification for other kinds of objects > which appear in traversal results, such as paths. > > > Josh > > > [original message clipped] >