Re: [DISCUSS] New IO format for GLVs/Gremlin Server

gallardo.kev...@gmail.com Fri, 15 Jul 2016 07:53:12 -0700


On 2016-07-15 14:44 (+0100), Robert Dale <robd...@gmail.com> wrote: 
> It looks to me like a self-inflicted problem because the things that
> are typed are already native to json so it's redundant.  And to go a
> step further, I wouldn't consider the types to be 'correct' because
> everything that is a HashMap is really a Vertex, Edge, or Property.
> 
> On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
> <gallardo.kev...@gmail.com> wrote:
> >
> >
> > On 2016-07-13 13:17 (+0100), Robert Dale <robd...@gmail.com> wrote:
> >> Marko, I agree that empty object properties should not be represented.
> >> I think if you saw that in an example then it was probably for
> >> demonstration purposes.
> >>
> >> Kevin, can you expand on this comment:
> >>
> >> > the format you suggest would lead to the same inconsistencies as in 
> >> > GraphSON 1.0.
> >> > Since the type is at the same level than the data itself, whether the 
> >> > container is an Array or an Object
> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> >>
> >> What exactly are the inconsistencies?  What is the problem in
> >> determining an array or object?
> >> This is a natural JSON array (or list): []
> >> This is a natural JSON object: {}
> >>
> >> Type at the object level is a common pattern and supported feature of
> >> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> >> 'type' at the object level. Titan supports GeoJSON currently.  I
> >> wonder if it would make sense to promote geometry to gremlin.
> >>
> >
> > I wasn't probably clear enough, in my first email exposing my motivation to 
> > improve GraphSON 1.0, one of the things I noticed was that according to the 
> > enclosing element (either an Array or a Map), a type will either be 
> > described as (respectively) an element of the Array, or a key/value pair in 
> > a Map, you can see that in the "embedded types" example of the Tinkerpop 
> > docs : 
> > http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer .
> >
> > There you can see that the type "java.util.ArrayList" is a simple element 
> > of the enclosing array, but the "java.util.HashMap" type is a field of the 
> > enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not seem 
> > consistent to me and even though I know that Jackson handles it well, it 
> > seems that we'd better provide a consistent enclosing format that we know 
> > is fixed whatever the enclosed data is, to make the automatic type 
> > detection for other parsers in other libraries/languages easier. Does that 
> > make sense ?
> >
> >> We should probably start documenting a table of supported types. (If
> >> there is one, please provide link)
> >>
> >> I wonder if it even makes sense to type numbers according to their
> >> memory model. As objects, Byte, Short, and Integer occupy the same
> >> space. Long isn't much more.  So in Java we're not saving much space.
> >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >> have this concept.  Does anything in gremlin actually require this?
> >> I'm thinking that this is only going to be relevant at the domain
> >> model level. This way json native numbers can be used and not need
> >> typing.
> >>
> >> Additionally, I think that all things that will be typed should always
> >> be typed. For the use cases of injesting a saved graph from a file, it
> >> can probably be assumed that the top-level objects are vertices since
> >> the graph is vertex-centric and everything else follows naturally.
> >> I'm not entirely sure what is required for submitting traversals to
> >> gremlin server from GLV.  However, if this is used for the results
> >> from gremlin server then the results could start with any one of path,
> >> vertex, edge, property, vertex property, etc. So you'll need that type
> >> data there.
> >>
> >> --
> >> Robert Dale
> >>
> >> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez <okramma...@gmail.com> 
> >> wrote:
> >> > Hi,
> >> >
> >> > Iâm not following this PR too closely so what I might be saying is a 
> >> > already known/argued against/etc.
> >> >
> >> >         1. I think we should go with Robert Daleâs proposal of int32, 
> >> > int64, Vertex, uuid, etc. instead of Java class names.
> >> >         2. In Java we then have a Map<String,Class> for typecasting 
> >> > accordingly.
> >> >         3. This would make GraphSON 2.0 perfect for Bytecode 
> >> > serialization in TINKERPOP-1278.
> >> >         4. I think that if a Vertex, Edge, etc. doesnât have 
> >> > properties, outV, etc. then donât even have those fields in the 
> >> > representation.
> >> >         5. Most of the serialization back and forth will be ReferenceXXX 
> >> > elements and thus, donât create more Maps/lists for no reason. â 
> >> > less chars.
> >> >
> >> > For me, my interests with this work is all about a language agnostic way 
> >> > of sending Gremlin traversal bytecode between different languages. This 
> >> > work is exactly what I am looking for.
> >> >
> >> > Thanks,
> >> > Marko.
> >> >
> >> > http://markorodriguez.com
> >> >
> >> >
> >> >
> >> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette <spmalle...@gmail.com> 
> >> >> wrote:
> >> >>
> >> >> With all the work on GLVs and the recent work on GraphSON 2.0, I think 
> >> >> it's
> >> >> important that we have a solid, efficient, programming language neutral,
> >> >> lossless serialization format. Right now that format is GraphSON and it
> >> >> works for that purpose (ever more  so with 2.0). Given some discussion 
> >> >> on
> >> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >> >>
> >> >> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >> >>
> >> >> I wonder if we shouldn't consider another IO format that has Gremlin
> >> >> Server/GLVs in mind. At this point I'm not suggesting anything specific 
> >> >> -
> >> >> I'm just hanging the idea out for further discussion and brain storming.
> >> >> Thoughts?
> >> >
> >>
> >>
> >>
> >> --
> >> Robert Dale
> 
> 
> 
> -- 
> Robert Dale
> Correct, these types weren't relevant... I only wanted to show you the 
> format...
However, I don't manage to understand the structure behind the format you 
suggest, and I don't manage to establish a clear explicit representation in my 
mind, regarding the example you provided in the TP-1274 PR. Could you please 
give an example of how you would imagine the serialized JSON of : 
- an example list of typed values, like List<UUID>
- an example list of typed and untyped values, like a list with UUIDs and 
booleans
- an example map of typed and untyped values


How would you define that format in a general way ? Like what I did when saying 
"- untyped : value
- typed : {"@type", "typeName", "value" : value}"

Just trying your point better. 
Also what are the downsides you see with the format suggested above ?

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Reply via email to