Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Robert Dale Fri, 15 Jul 2016 13:34:20 -0700

Responding to Marko and Kevin...

Marko wrote:
> SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In 
> terms of numbers, I think, unfortunately, we have to stick with int32, int64, 
> float, double, etc. given graph database providers and their type systems. 
> Its not about the Gremlin traversal API, its more about provider schemas. 
> has(“someNumber”,12L) vs. has(“someNumber”,12).

I call the above behavior a bug or a peculiarity of Titan; it clings
to a java object idiom. On the other hand, DSE graph exhibits expected
behavior (as does IBM Graph, Neo4j.)  I know of no other query
language that behaves like this - e.g. SQL, CassandraQL, JPQL, JOOQ
(the gremlin of sql).  Typically the underlying driver/provider does
the "right" thing (or doesn't).  Again, take UUID in gremlin, I can
pass a string.  The underlying driver seems to convert it to UUID, I
don't have to provide an UUID object.  This seems inconsistent.
Either it's doing strong typing or not.  Which is it??

IMO, the query language should be abstracted from the storage schema.
And I think this is where we have the impedance mismatch in this
thread.  What gremlin is really acting like in addition to query
language is an Object Graph Mapper (like an ORM).  It's playing two
roles. So I'm also arguing that it should have a single
responsibility. Yes, I've said this before. But maybe it changes
things too drastically.  Maybe there are aspects of gremlin that
actually require strong typing. I don't know. I haven't run into them.
On to the next item...

Kevin wrote:
>> Correct, these types weren't relevant... I only wanted to show you the 
>> format...
> However, I don't manage to understand the structure behind the format you 
> suggest, and I don't manage to establish a clear explicit representation in 
> my mind, regarding the example you provided in the TP-1274 PR. Could you 
> please give an example of how you would imagine the serialized JSON of :
> - an example list of typed values, like List<UUID>
> - an example list of typed and untyped values, like a list with UUIDs and 
> booleans
> - an example map of typed and untyped values
>
> How would you define that format in a general way ? Like what I did when 
> saying
> "- untyped : value
> - typed : {"@type", "typeName", "value" : value}"
>
> Just trying your point better.
> Also what are the downsides you see with the format suggested above ?

The original format was in a list. I must have missed where you
accepted this format. In any case, like I originally stated, if you
want strong-typing, then _everything_ must be an _object_.

Here's an example of non-typed:
https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655
- native json only

Here's strongly typed:
https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a
- set (as an object), list (as an object), mixed-type lists, etc

Let me add that while there's no strict definition of schemaless, it
was not necessarily intended to include having mixed data types for a
single field. This is a really bad idea. Experts warn against this.
Most NoSQL databases don't even support this. You will probably die if
you use it. The default behavior for DSE graph, IBM graph, and even
Titan is to create the schema based on the first type inserted.  It
will complain if any subsequent type is different.

Also, schemaless doesn't mean without any schema. While not having to
define a schema up-front during a quickstart or early development
makes life easier, no one doing any serious work or going to
production goes without a schema.  Again, see DSE graph, IBM graph,
Titan, etc.

Let's take a look at DSE graph types [1]. They are a subset of
cassandra data types. What's really interesting about that is that
they are all represented in some simple form - string or integer
literals (and bool) - except for Geo but in even that can be in some
form of arrays. So blob, inet, uuid, even timestamp are all queried as
strings!

Also look at other APIs and you'll see the use of JSON without
strong-typing for non-domain and/or scalar types in IBM graph,
Elasticsearch, Solr, and just about every other REST API out there.
Types other than the weak-typing in JSON are settled by the backing
schema (southbound) or by the OGM (northbound).  Additionally,
VertexProperty returns only Object. I still have to know what the
underlying type is. What difference does it make if I cast
(strong-typed) or convert (weak-type)? I still have to do something in
order for it to be usable in java.  Maybe I'm just missing
something...

But at the end of the day, I would prefer consistency over whether
strong or weak typing.  :-)

Finally, I still would consider promoting spatial shapes to a
first-class entity in gremlin and include GeoJSON for serialization.
This is may be a separate effort.

1. 
https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/reference/refDSEGraphDataTypes.html

-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Reply via email to