Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Stephen Mallette Wed, 13 Jul 2016 11:15:44 -0700

> First, is there a wiki that we can keep updated with decisions or at least
decision points? I know there's an old wiki, but is there/will there be a
new wiki?


No - we don't have a wiki. Design decisions tend to get trapped in the
mailing list (or JIRA) which isn't so good. Maybe that's a separate
discussion.

> Neo4j via NeoGraph appears to do the right thing for vertex IDs and 
> properties.
It treats all types, primitive or object, from byte to long, double, float
as numbers.

Perhaps we could take a stronger stance on this in the test cases? Does
anyone know what graphs this would impact besides Titan and TinkerGraph (I
suspect DSE Graph, but not 100% sure)?



On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale <[email protected]> wrote:

> First, is there a wiki that we can keep updated with decisions or at
> least decision points? I know there's an old wiki, but is there/will
> there be a new wiki?
>
> Stephen, IMO, that's still bad behavior. That says to me a number is
> not a number.  But, yes, schemaless does allow one to put crap in and
> get crap out. So designers should be aware of these types of pitfalls.
> Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> properties. It treats all types, primitive or object, from byte to
> long, double, float as numbers.  This is pretty standard behavior in
> SQL, JDBC drivers, and other NoSQL technologies.
>
>
>
> On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette <[email protected]>
> wrote:
> > Marko, the namespacing idea seems smart.
> >
> > Robert, I think other graphs have similar behavior to TinkerGraph's
> > default. In Titan, the absence of a schema (default, obviously) produces
> > this:
> >
> > gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
> > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> > gremlin> graph.addVertex("n",100D)
> > ==>v[4288]
> > gremlin> graph.traversal().V().has('n',100f)
> > gremlin> graph.traversal().V().has('n',100d)
> > ==>v[4288]
> >
> > This kind of problem has caused trouble for years and years in TinkerPop
> > and allowing the type to be embedded seemed like a good solution. Of
> > course, you bring up a good point about javascript - to this point we've
> > relied on JS devs to conform to java/groovy types by forcing conversion
> in
> > their gremlin scripts or configuring their graphs to avoid use of types
> > that would produce these kinds of ambiguous results.
> >
> >
> >
> > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale <[email protected]> wrote:
> >
> >> And just to be clear, I'm not necessarily disagreeing. But I think
> >> it's important to understand where and why it's necessary.
> >>
> >> For example, if I'm writing a gremlin script (string), I don't type my
> >> input numbers.  It's rightly converted by the underlying architecture.
> >> (I'm guessing groovy which has enhanced number support).  Also, if a
> >> GLV is submitting typed numbers, how would that work? For example, in
> >> Javascript?
> >>
> >> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale <[email protected]> wrote:
> >> > Hi, Stephen.  I think that's a bad example. You may recall I brought
> >> > up that issue in the forum.  However, it's actually attributed to the
> >> > default ID manager of ANY (for historical) which I think is a really
> >> > bad default (and reason) because it only leads to confusion.  Java is
> >> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
> >> > 5.  In Java, number objects must be coerced into like form for
> >> > comparison. The other ID managers do this coercion.  Saner languages
> >> > do this under the covers.
> >> >
> >> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette <
> [email protected]>
> >> wrote:
> >> >> Robert, thanks for joining this discussion.
> >> >>
> >> >>> I wonder if it even makes sense to type numbers according to their
> >> >> memory model. As objects, Byte, Short, and Integer occupy the same
> >> >> space. Long isn't much more.  So in Java we're not saving much space.
> >> >> Jackson will attempt to parse in order: int, long, BigInt,
> BigDecimal.
> >> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >> >> have this concept.  Does anything in gremlin actually require this?
> >> >>
> >> >> If the intended numeric type isn't preserved, weird things can happen
> >> with
> >> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph using
> the
> >> >> default ID manager will not be happy if you try to do a lookup of
> Long
> >> >> identifiers with an Integer:
> >> >>
> >> >> gremlin> graph = TinkerFactory.createModern()
> >> >> ==>tinkergraph[vertices:6 edges:6]
> >> >> gremlin> graph.vertices(1)
> >> >> ==>v[1]
> >> >> gremlin> graph.vertices(1L)
> >> >> gremlin>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale <[email protected]>
> wrote:
> >> >>
> >> >>> Marko, I agree that empty object properties should not be
> represented.
> >> >>> I think if you saw that in an example then it was probably for
> >> >>> demonstration purposes.
> >> >>>
> >> >>> Kevin, can you expand on this comment:
> >> >>>
> >> >>> > the format you suggest would lead to the same inconsistencies as
> in
> >> >>> GraphSON 1.0.
> >> >>> > Since the type is at the same level than the data itself, whether
> the
> >> >>> container is an Array or an Object
> >> >>> >
> https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> >> >>>
> >> >>> What exactly are the inconsistencies?  What is the problem in
> >> >>> determining an array or object?
> >> >>> This is a natural JSON array (or list): []
> >> >>> This is a natural JSON object: {}
> >> >>>
> >> >>> Type at the object level is a common pattern and supported feature
> of
> >> >>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> >> >>> 'type' at the object level. Titan supports GeoJSON currently.  I
> >> >>> wonder if it would make sense to promote geometry to gremlin.
> >> >>>
> >> >>> We should probably start documenting a table of supported types. (If
> >> >>> there is one, please provide link)
> >> >>>
> >> >>> I wonder if it even makes sense to type numbers according to their
> >> >>> memory model. As objects, Byte, Short, and Integer occupy the same
> >> >>> space. Long isn't much more.  So in Java we're not saving much
> space.
> >> >>> Jackson will attempt to parse in order: int, long, BigInt,
> BigDecimal.
> >> >>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >> >>> have this concept.  Does anything in gremlin actually require this?
> >> >>> I'm thinking that this is only going to be relevant at the domain
> >> >>> model level. This way json native numbers can be used and not need
> >> >>> typing.
> >> >>>
> >> >>> Additionally, I think that all things that will be typed should
> always
> >> >>> be typed. For the use cases of injesting a saved graph from a file,
> it
> >> >>> can probably be assumed that the top-level objects are vertices
> since
> >> >>> the graph is vertex-centric and everything else follows naturally.
> >> >>> I'm not entirely sure what is required for submitting traversals to
> >> >>> gremlin server from GLV.  However, if this is used for the results
> >> >>> from gremlin server then the results could start with any one of
> path,
> >> >>> vertex, edge, property, vertex property, etc. So you'll need that
> type
> >> >>> data there.
> >> >>>
> >> >>> --
> >> >>> Robert Dale
> >> >>>
> >> >>> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez <
> [email protected]
> >> >
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > I’m not following this PR too closely so what I might be saying
> is a
> >> >>> already known/argued against/etc.
> >> >>> >
> >> >>> >         1. I think we should go with Robert Dale’s proposal of
> int32,
> >> >>> int64, Vertex, uuid, etc. instead of Java class names.
> >> >>> >         2. In Java we then have a Map<String,Class> for
> typecasting
> >> >>> accordingly.
> >> >>> >         3. This would make GraphSON 2.0 perfect for Bytecode
> >> >>> serialization in TINKERPOP-1278.
> >> >>> >         4. I think that if a Vertex, Edge, etc. doesn’t have
> >> properties,
> >> >>> outV, etc. then don’t even have those fields in the representation.
> >> >>> >         5. Most of the serialization back and forth will be
> >> ReferenceXXX
> >> >>> elements and thus, don’t create more Maps/lists for no reason. —
> less
> >> chars.
> >> >>> >
> >> >>> > For me, my interests with this work is all about a language
> agnostic
> >> way
> >> >>> of sending Gremlin traversal bytecode between different languages.
> This
> >> >>> work is exactly what I am looking for.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Marko.
> >> >>> >
> >> >>> > http://markorodriguez.com
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette <
> [email protected]>
> >> >>> wrote:
> >> >>> >>
> >> >>> >> With all the work on GLVs and the recent work on GraphSON 2.0, I
> >> think
> >> >>> it's
> >> >>> >> important that we have a solid, efficient, programming language
> >> neutral,
> >> >>> >> lossless serialization format. Right now that format is GraphSON
> >> and it
> >> >>> >> works for that purpose (ever more  so with 2.0). Given some
> >> discussion
> >> >>> on
> >> >>> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >> >>> >>
> >> >>> >>
> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >> >>> >>
> >> >>> >> I wonder if we shouldn't consider another IO format that has
> Gremlin
> >> >>> >> Server/GLVs in mind. At this point I'm not suggesting anything
> >> specific
> >> >>> -
> >> >>> >> I'm just hanging the idea out for further discussion and brain
> >> storming.
> >> >>> >> Thoughts?
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Robert Dale
> >> >>>
> >> >
> >> >
> >> >
> >> > --
> >> > Robert Dale
> >>
> >>
> >>
> >> --
> >> Robert Dale
> >>
>
>
>
> --
> Robert Dale
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Reply via email to