Re: [DISCUSS] New IO format for GLVs/Gremlin Server

gallardo.kev...@gmail.com Mon, 18 Jul 2016 03:30:11 -0700


On 2016-07-15 21:32 (+0100), Robert Dale <robd...@gmail.com> wrote: 
> Responding to Marko and Kevin...
> 
> Marko wrote:
> > SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In 
> > terms of numbers, I think, unfortunately, we have to stick with int32, 
> > int64, float, double, etc. given graph database providers and their type 
> > systems. Its not about the Gremlin traversal API, its more about provider 
> > schemas. has(âsomeNumberâ,12L) vs. has(âsomeNumberâ,12).
> 
> I call the above behavior a bug or a peculiarity of Titan; it clings
> to a java object idiom. On the other hand, DSE graph exhibits expected
> behavior (as does IBM Graph, Neo4j.)  I know of no other query
> language that behaves like this - e.g. SQL, CassandraQL, JPQL, JOOQ
> (the gremlin of sql).  Typically the underlying driver/provider does
> the "right" thing (or doesn't).  Again, take UUID in gremlin, I can
> pass a string.  The underlying driver seems to convert it to UUID, I
> don't have to provide an UUID object.  This seems inconsistent.
> Either it's doing strong typing or not.  Which is it??
> 
> IMO, the query language should be abstracted from the storage schema.
> And I think this is where we have the impedance mismatch in this
> thread.  What gremlin is really acting like in addition to query
> language is an Object Graph Mapper (like an ORM).  It's playing two
> roles. So I'm also arguing that it should have a single
> responsibility. Yes, I've said this before. But maybe it changes
> things too drastically.  Maybe there are aspects of gremlin that
> actually require strong typing. I don't know. I haven't run into them.
> On to the next item...
> 
> Kevin wrote:
> >> Correct, these types weren't relevant... I only wanted to show you the 
> >> format...
> > However, I don't manage to understand the structure behind the format you 
> > suggest, and I don't manage to establish a clear explicit representation in 
> > my mind, regarding the example you provided in the TP-1274 PR. Could you 
> > please give an example of how you would imagine the serialized JSON of :
> > - an example list of typed values, like List<UUID>
> > - an example list of typed and untyped values, like a list with UUIDs and 
> > booleans
> > - an example map of typed and untyped values
> >
> > How would you define that format in a general way ? Like what I did when 
> > saying
> > "- untyped : value
> > - typed : {"@type", "typeName", "value" : value}"
> >
> > Just trying your point better.
> > Also what are the downsides you see with the format suggested above ?
> 
> The original format was in a list. I must have missed where you
> accepted this format. In any case, like I originally stated, if you
> want strong-typing, then _everything_ must be an _object_.
> 
> Here's an example of non-typed:
> https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655
> - native json only
> 
> Here's strongly typed:
> https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a
> - set (as an object), list (as an object), mixed-type lists, etc
>


OK, glad to see your revised version of the format is the exact same I defined 
initially. I think we're on the same page here now. Except one thing, it seems 
like the type information for vertex is not consistent with the rest, if as you 
say if "everything is an object", then it would be like this : 
https://gist.github.com/newkek/2d748dc59029f01af18b2a0e80494a31 .
However, strong typing does not necessarily mean to me that there needs to be a 
type metadata if the type is already properly handled by JSON. I.e. I don't see 
the necessity to add type information for data like boolean. There is no 
ambiguity possible.

> Let me add that while there's no strict definition of schemaless, it
> was not necessarily intended to include having mixed data types for a
> single field. This is a really bad idea. Experts warn against this.
> Most NoSQL databases don't even support this. You will probably die if
> you use it. The default behavior for DSE graph, IBM graph, and even
> Titan is to create the schema based on the first type inserted.  It
> will complain if any subsequent type is different.

No, in DSE Graph, the schema has to be defined upfront and does not depend on 
the first element inserted. But I'm not the best person to talk about that and 
I'm not sure this is the right place..

However concerning mix typed/non-typed I am not concerned about what the Graph 
provider would do but more about what the protocol can handle and hence I am in 
favour of having a protocol that can handle as much as possible in a consistent 
way, for example collections of typed and non typed values, as it is possible 
in a TinkerGraph. Which means, a VertexProperty can be a list of Strings and 
UUIDs, one doesn't need type, the other does.

> 
> Also, schemaless doesn't mean without any schema. While not having to
> define a schema up-front during a quickstart or early development
> makes life easier, no one doing any serious work or going to
> production goes without a schema.  Again, see DSE graph, IBM graph,
> Titan, etc.
> 
> Let's take a look at DSE graph types [1]. They are a subset of
> cassandra data types. What's really interesting about that is that
> they are all represented in some simple form - string or integer
> literals (and bool) - except for Geo but in even that can be in some
> form of arrays. So blob, inet, uuid, even timestamp are all queried as
> strings!
> 
> Also look at other APIs and you'll see the use of JSON without
> strong-typing for non-domain and/or scalar types in IBM graph,
> Elasticsearch, Solr, and just about every other REST API out there.
> Types other than the weak-typing in JSON are settled by the backing
> schema (southbound) or by the OGM (northbound).  Additionally,
> VertexProperty returns only Object. I still have to know what the
> underlying type is. What difference does it make if I cast
> (strong-typed) or convert (weak-type)? I still have to do something in
> order for it to be usable in java.  Maybe I'm just missing
> something...
> 
> But at the end of the day, I would prefer consistency over whether
> strong or weak typing.  :-)

If you want extreme consistency for EVERY value then you would also include 
type for Strings? No, it doesn't make sense in JSON, right? If it doesn't make 
sense for a String, imo it doesn't make sense for a boolean either.

> 
> Finally, I still would consider promoting spatial shapes to a
> first-class entity in gremlin and include GeoJSON for serialization.
> This is may be a separate effort.
> 
> 1. 
> https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/reference/refDSEGraphDataTypes.html
> 
> -- 
> Robert Dale
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Reply via email to