Responding to Marko and Kevin... Marko wrote: > SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In > terms of numbers, I think, unfortunately, we have to stick with int32, int64, > float, double, etc. given graph database providers and their type systems. > Its not about the Gremlin traversal API, its more about provider schemas. > has(“someNumber”,12L) vs. has(“someNumber”,12).
I call the above behavior a bug or a peculiarity of Titan; it clings to a java object idiom. On the other hand, DSE graph exhibits expected behavior (as does IBM Graph, Neo4j.) I know of no other query language that behaves like this - e.g. SQL, CassandraQL, JPQL, JOOQ (the gremlin of sql). Typically the underlying driver/provider does the "right" thing (or doesn't). Again, take UUID in gremlin, I can pass a string. The underlying driver seems to convert it to UUID, I don't have to provide an UUID object. This seems inconsistent. Either it's doing strong typing or not. Which is it?? IMO, the query language should be abstracted from the storage schema. And I think this is where we have the impedance mismatch in this thread. What gremlin is really acting like in addition to query language is an Object Graph Mapper (like an ORM). It's playing two roles. So I'm also arguing that it should have a single responsibility. Yes, I've said this before. But maybe it changes things too drastically. Maybe there are aspects of gremlin that actually require strong typing. I don't know. I haven't run into them. On to the next item... Kevin wrote: >> Correct, these types weren't relevant... I only wanted to show you the >> format... > However, I don't manage to understand the structure behind the format you > suggest, and I don't manage to establish a clear explicit representation in > my mind, regarding the example you provided in the TP-1274 PR. Could you > please give an example of how you would imagine the serialized JSON of : > - an example list of typed values, like List<UUID> > - an example list of typed and untyped values, like a list with UUIDs and > booleans > - an example map of typed and untyped values > > How would you define that format in a general way ? Like what I did when > saying > "- untyped : value > - typed : {"@type", "typeName", "value" : value}" > > Just trying your point better. > Also what are the downsides you see with the format suggested above ? The original format was in a list. I must have missed where you accepted this format. In any case, like I originally stated, if you want strong-typing, then _everything_ must be an _object_. Here's an example of non-typed: https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655 - native json only Here's strongly typed: https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a - set (as an object), list (as an object), mixed-type lists, etc Let me add that while there's no strict definition of schemaless, it was not necessarily intended to include having mixed data types for a single field. This is a really bad idea. Experts warn against this. Most NoSQL databases don't even support this. You will probably die if you use it. The default behavior for DSE graph, IBM graph, and even Titan is to create the schema based on the first type inserted. It will complain if any subsequent type is different. Also, schemaless doesn't mean without any schema. While not having to define a schema up-front during a quickstart or early development makes life easier, no one doing any serious work or going to production goes without a schema. Again, see DSE graph, IBM graph, Titan, etc. Let's take a look at DSE graph types [1]. They are a subset of cassandra data types. What's really interesting about that is that they are all represented in some simple form - string or integer literals (and bool) - except for Geo but in even that can be in some form of arrays. So blob, inet, uuid, even timestamp are all queried as strings! Also look at other APIs and you'll see the use of JSON without strong-typing for non-domain and/or scalar types in IBM graph, Elasticsearch, Solr, and just about every other REST API out there. Types other than the weak-typing in JSON are settled by the backing schema (southbound) or by the OGM (northbound). Additionally, VertexProperty returns only Object. I still have to know what the underlying type is. What difference does it make if I cast (strong-typed) or convert (weak-type)? I still have to do something in order for it to be usable in java. Maybe I'm just missing something... But at the end of the day, I would prefer consistency over whether strong or weak typing. :-) Finally, I still would consider promoting spatial shapes to a first-class entity in gremlin and include GeoJSON for serialization. This is may be a separate effort. 1. https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/reference/refDSEGraphDataTypes.html -- Robert Dale