Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-20 Thread gallardo.kev...@gmail.com


On 2016-07-19 22:28 (+0100), Marko Rodriguez <okramma...@gmail.com> wrote: 
> Hi,
> 
> However, in general we just need an “object mapper pattern.” For instance:
> 
> For any JSON object { } that has a @type field, the @type value maps to a 
> deserializer. Thus, while we need to be able to serialize/deserialize the 
> standard Vertex/Edge/VertexProperty/etc. the representation should be 
> generalized to support any registered @type.

Agree with that, we wouldn't have had the choice than adding deserializers for 
these types anyway with how Jackson works. I had also planned indeed to make 
the GraphSONTypeIdResolver - which is the component that handles the conversion 
"typeID" -> "Java Class" for deserialization and "Java Class" -> "typeID" for 
serialization - configurable for users.

> 
>   Java GraphSON serializer/deserializer registration:
>   
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONModule.java#L129-L147
>  
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONModule.java#L129-L147>
> 
>   Python GraphSON serializer registration:
>   
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-python/src/main/jython/gremlin_python/process/graphson.py#L122-L127
>  
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-python/src/main/jython/gremlin_python/process/graphson.py#L122-L127>
> 
> People can register more @types as needed for their graph processor’s type 
> system.
> 
> Marko.
> 
> http://markorodriguez.com
> 
> 
> 
> > On Jul 19, 2016, at 12:55 PM, Marko Rodriguez <okramma...@gmail.com> wrote:
> > 
> > We need:
> > 
> > Graph
> > Element
> > Vertex
> > Edge
> > VertexProperty
> > Property
> > Path
> > TraversalExplanation
> > TraversalMetrics
> > Traversal (i.e. Bytecode)
> > Traverser (object + bulk at minimum)
> > 
> > Marko.
> > 
> > http://markorodriguez.com
> > 
> > 
> > 
> >> On Jul 19, 2016, at 12:45 PM, Robert Dale <robd...@gmail.com> wrote:
> >> 
> >> There's also Path that can be returned from a query. It looks like
> >> GraphSON 1.0 handles this today in the REST API but it's not typed as
> >> a path.
> >> 
> >> On Tue, Jul 19, 2016 at 2:14 PM, gallardo.kev...@gmail.com
> >> <gallardo.kev...@gmail.com> wrote:
> >>> 
> >>> 
> >>> On 2016-07-19 18:02 (+0100), Robert Dale <robd...@gmail.com> wrote:
> >>>> - It seems redundant to nest a vertex or edge inside a type-value
> >>>> object and is inconsistent with a VertexProperty.
> >>>> - VertexProperty and (edge) Property are implicit types. I don't know
> >>>> if this is ok. Could they ever be used outside of their parents where
> >>>> they would need to be typed?
> >>> 
> >>> I agree with the VertexProperty remark. That's one last question I wanted 
> >>> to solve, if we go for typing Vertex and edges, do we include others? The 
> >>> full list I see then is : vertex/edge/vertexproperty/property/graph.
> >>> 
> >>> However I am not sure how useful it is to have more than Vertex and Edge. 
> >>> As, when deserializing a Vertex for example, there's no question as to 
> >>> what is in the "properties" field of the Vertex, there are necessarily 
> >>> only VertexProperties. However looking at the API, it seems like it is 
> >>> supported to write only a VertexProperty if one wants to (see 
> >>> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense 
> >>> to add the types for the elements of the list I described above. @stephen 
> >>> any thoughts about that ?
> >>> 
> >>>> - Edges:
> >>>> - is in/outVLabel new? Couldn't find it in the API or any examples of 
> >>>> this.
> >>>> - why not make inV/outV have proper vertices with labels (to satisfy
> >>>> the case previous case) instead of just IDs? This would also be more
> >>>> consistent with the API.
> >>> 
> >>> I haven't touched that part, it was in the format before. I believe this 
> >>> is a question for Stephen.
> >>> 
> >>>> 
> &

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread gallardo.kev...@gmail.com


On 2016-07-19 18:02 (+0100), Robert Dale <robd...@gmail.com> wrote: 
> - It seems redundant to nest a vertex or edge inside a type-value
> object and is inconsistent with a VertexProperty.
> - VertexProperty and (edge) Property are implicit types. I don't know
> if this is ok. Could they ever be used outside of their parents where
> they would need to be typed?

I agree with the VertexProperty remark. That's one last question I wanted to 
solve, if we go for typing Vertex and edges, do we include others? The full 
list I see then is : vertex/edge/vertexproperty/property/graph.

However I am not sure how useful it is to have more than Vertex and Edge. As, 
when deserializing a Vertex for example, there's no question as to what is in 
the "properties" field of the Vertex, there are necessarily only 
VertexProperties. However looking at the API, it seems like it is supported to 
write only a VertexProperty if one wants to (see 
GraphWriter.writeVertexProperty()), so in that case, to me it makes sense to 
add the types for the elements of the list I described above. @stephen any 
thoughts about that ?

> - Edges:
>   - is in/outVLabel new? Couldn't find it in the API or any examples of this.
>   - why not make inV/outV have proper vertices with labels (to satisfy
> the case previous case) instead of just IDs? This would also be more
> consistent with the API.

I haven't touched that part, it was in the format before. I believe this is a 
question for Stephen.

> 
> Otherwise looks good!

Thanks for the feedback.
> 
> On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
> <gallardo.kev...@gmail.com> wrote:
> >
> >
> > On 2016-07-15 16:25 (+0100), 
> > "gallardo.kev...@gmail.com"<gallardo.kev...@gmail.com> wrote:
> >>
> >>
> >> On 2016-07-09 16:48 (+0100), Stephen Mallette <spmalle...@gmail.com> wrote:
> >> > With all the work on GLVs and the recent work on GraphSON 2.0, I think 
> >> > it's
> >> > important that we have a solid, efficient, programming language neutral,
> >> > lossless serialization format. Right now that format is GraphSON and it
> >> > works for that purpose (ever more  so with 2.0). Given some discussion on
> >> > the GraphSON 2.0 PR driven a bit by Robert Dale:
> >> >
> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >> >
> >> > I wonder if we shouldn't consider another IO format that has Gremlin
> >> > Server/GLVs in mind. At this point I'm not suggesting anything specific -
> >> > I'm just hanging the idea out for further discussion and brain storming.
> >> > Thoughts?
> >> >
> >>
> >> Hey, so I'm trying to gather all infos we have here in order to prepare to 
> >> move forward with the implem of GraphSON 2.0, here's what I come up with :
> >>
> >> Things we have :
> >> - Type format.
> >> - The structure in Jackson to implement our own type format.
> >> - All non native Graph types are typed (except the domain specific types).
> >>
> >> New things we need :
> >> - Types for domain specific objects.
> >> - Types for all numeric values.
> >> - Don't serialize empty fields (outV and stuff).
> >>
> >> Things we consider changing :
> >> - Type IDs convention. Before : Java simple class names. Now : starts with 
> >> a "domain" like "gremlin" followed by the "type name", which is a 
> >> lowercased type name (like "uuid", or "float", or "vertex"). Example : 
> >> "gremlin:uuid".
> >> - Type format ?
> >>
> >> Am I missing something ?
> >>
> > Hey,
> >
> > So I've made a few changes in the code from the original GraphSON 2.0, with 
> > the objectives described above, the code is still messy but I just thought 
> > I'd share some samples to start getting into the work and gather some 
> > feedback.
> >
> > In the example I've created a TinkerGraph with 2 vertices connected by an 
> > edge. The graph is serialized as a TinkerGraph.
> > The samples are there : 
> > https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
> >
> > Any feedback appreciated.
> 
> 
> 
> -- 
> Robert Dale
> 


Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread gallardo.kev...@gmail.com


On 2016-07-19 17:47 (+0100), Stephen Mallette <spmalle...@gmail.com> wrote: 
> it should - properties are a Map of Lists of Property values.
> 
> On Tue, Jul 19, 2016 at 12:45 PM, Dylan Millikin <dylan.milli...@gmail.com>
> wrote:
> 
> > Quick question which is probably handled automatically but is this working
> > with multiple cardinalities on properties?
> >
> > On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com <
> > gallardo.kev...@gmail.com> wrote:
> >
> > >
> > >
> > > On 2016-07-15 16:25 (+0100), "gallardo.kev...@gmail.com"<
> > > gallardo.kev...@gmail.com> wrote:
> > > >
> > > >
> > > > On 2016-07-09 16:48 (+0100), Stephen Mallette <spmalle...@gmail.com>
> > > wrote:
> > > > > With all the work on GLVs and the recent work on GraphSON 2.0, I
> > think
> > > it's
> > > > > important that we have a solid, efficient, programming language
> > > neutral,
> > > > > lossless serialization format. Right now that format is GraphSON and
> > it
> > > > > works for that purpose (ever more  so with 2.0). Given some
> > discussion
> > > on
> > > > > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > > > >
> > > > > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > > > >
> > > > > I wonder if we shouldn't consider another IO format that has Gremlin
> > > > > Server/GLVs in mind. At this point I'm not suggesting anything
> > > specific -
> > > > > I'm just hanging the idea out for further discussion and brain
> > > storming.
> > > > > Thoughts?
> > > > >
> > > >
> > > > Hey, so I'm trying to gather all infos we have here in order to prepare
> > > to move forward with the implem of GraphSON 2.0, here's what I come up
> > with
> > > :
> > > >
> > > > Things we have :
> > > > - Type format.
> > > > - The structure in Jackson to implement our own type format.
> > > > - All non native Graph types are typed (except the domain specific
> > > types).
> > > >
> > > > New things we need :
> > > > - Types for domain specific objects.
> > > > - Types for all numeric values.
> > > > - Don't serialize empty fields (outV and stuff).
> > > >
> > > > Things we consider changing :
> > > > - Type IDs convention. Before : Java simple class names. Now : starts
> > > with a "domain" like "gremlin" followed by the "type name", which is a
> > > lowercased type name (like "uuid", or "float", or "vertex"). Example :
> > > "gremlin:uuid".
> > > > - Type format ?
> > > >
> > > > Am I missing something ?
> > > >
> > > Hey,
> > >
> > > So I've made a few changes in the code from the original GraphSON 2.0,
> > > with the objectives described above, the code is still messy but I just
> > > thought I'd share some samples to start getting into the work and gather
> > > some feedback.
> > >
> > > In the example I've created a TinkerGraph with 2 vertices connected by an
> > > edge. The graph is serialized as a TinkerGraph.
> > > The samples are there :
> > > https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
> > >
> > > Any feedback appreciated.
> > >
> >
> 
I confirm, I didn't change anything in that section.


Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread gallardo.kev...@gmail.com


On 2016-07-15 16:25 (+0100), 
"gallardo.kev...@gmail.com"<gallardo.kev...@gmail.com> wrote: 
> 
> 
> On 2016-07-09 16:48 (+0100), Stephen Mallette <spmalle...@gmail.com> wrote: 
> > With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> > important that we have a solid, efficient, programming language neutral,
> > lossless serialization format. Right now that format is GraphSON and it
> > works for that purpose (ever more  so with 2.0). Given some discussion on
> > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > 
> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > 
> > I wonder if we shouldn't consider another IO format that has Gremlin
> > Server/GLVs in mind. At this point I'm not suggesting anything specific -
> > I'm just hanging the idea out for further discussion and brain storming.
> > Thoughts?
> > 
> 
> Hey, so I'm trying to gather all infos we have here in order to prepare to 
> move forward with the implem of GraphSON 2.0, here's what I come up with : 
> 
> Things we have : 
> - Type format.
> - The structure in Jackson to implement our own type format.
> - All non native Graph types are typed (except the domain specific types).
> 
> New things we need : 
> - Types for domain specific objects.
> - Types for all numeric values.
> - Don't serialize empty fields (outV and stuff).
> 
> Things we consider changing :
> - Type IDs convention. Before : Java simple class names. Now : starts with a 
> "domain" like "gremlin" followed by the "type name", which is a lowercased 
> type name (like "uuid", or "float", or "vertex"). Example : "gremlin:uuid".
> - Type format ?
> 
> Am I missing something ?
> 
Hey,

So I've made a few changes in the code from the original GraphSON 2.0, with the 
objectives described above, the code is still messy but I just thought I'd 
share some samples to start getting into the work and gather some feedback.

In the example I've created a TinkerGraph with 2 vertices connected by an edge. 
The graph is serialized as a TinkerGraph.
The samples are there : 
https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60

Any feedback appreciated.


Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-18 Thread gallardo.kev...@gmail.com


On 2016-07-15 21:32 (+0100), Robert Dale  wrote: 
> Responding to Marko and Kevin...
> 
> Marko wrote:
> > SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In 
> > terms of numbers, I think, unfortunately, we have to stick with int32, 
> > int64, float, double, etc. given graph database providers and their type 
> > systems. Its not about the Gremlin traversal API, its more about provider 
> > schemas. has(“someNumber”,12L) vs. has(“someNumber”,12).
> 
> I call the above behavior a bug or a peculiarity of Titan; it clings
> to a java object idiom. On the other hand, DSE graph exhibits expected
> behavior (as does IBM Graph, Neo4j.)  I know of no other query
> language that behaves like this - e.g. SQL, CassandraQL, JPQL, JOOQ
> (the gremlin of sql).  Typically the underlying driver/provider does
> the "right" thing (or doesn't).  Again, take UUID in gremlin, I can
> pass a string.  The underlying driver seems to convert it to UUID, I
> don't have to provide an UUID object.  This seems inconsistent.
> Either it's doing strong typing or not.  Which is it??
> 
> IMO, the query language should be abstracted from the storage schema.
> And I think this is where we have the impedance mismatch in this
> thread.  What gremlin is really acting like in addition to query
> language is an Object Graph Mapper (like an ORM).  It's playing two
> roles. So I'm also arguing that it should have a single
> responsibility. Yes, I've said this before. But maybe it changes
> things too drastically.  Maybe there are aspects of gremlin that
> actually require strong typing. I don't know. I haven't run into them.
> On to the next item...
> 
> Kevin wrote:
> >> Correct, these types weren't relevant... I only wanted to show you the 
> >> format...
> > However, I don't manage to understand the structure behind the format you 
> > suggest, and I don't manage to establish a clear explicit representation in 
> > my mind, regarding the example you provided in the TP-1274 PR. Could you 
> > please give an example of how you would imagine the serialized JSON of :
> > - an example list of typed values, like List
> > - an example list of typed and untyped values, like a list with UUIDs and 
> > booleans
> > - an example map of typed and untyped values
> >
> > How would you define that format in a general way ? Like what I did when 
> > saying
> > "- untyped : value
> > - typed : {"@type", "typeName", "value" : value}"
> >
> > Just trying your point better.
> > Also what are the downsides you see with the format suggested above ?
> 
> The original format was in a list. I must have missed where you
> accepted this format. In any case, like I originally stated, if you
> want strong-typing, then _everything_ must be an _object_.
> 
> Here's an example of non-typed:
> https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655
> - native json only
> 
> Here's strongly typed:
> https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a
> - set (as an object), list (as an object), mixed-type lists, etc
> 

OK, glad to see your revised version of the format is the exact same I defined 
initially. I think we're on the same page here now. Except one thing, it seems 
like the type information for vertex is not consistent with the rest, if as you 
say if "everything is an object", then it would be like this : 
https://gist.github.com/newkek/2d748dc59029f01af18b2a0e80494a31 .
However, strong typing does not necessarily mean to me that there needs to be a 
type metadata if the type is already properly handled by JSON. I.e. I don't see 
the necessity to add type information for data like boolean. There is no 
ambiguity possible.

> Let me add that while there's no strict definition of schemaless, it
> was not necessarily intended to include having mixed data types for a
> single field. This is a really bad idea. Experts warn against this.
> Most NoSQL databases don't even support this. You will probably die if
> you use it. The default behavior for DSE graph, IBM graph, and even
> Titan is to create the schema based on the first type inserted.  It
> will complain if any subsequent type is different.

No, in DSE Graph, the schema has to be defined upfront and does not depend on 
the first element inserted. But I'm not the best person to talk about that and 
I'm not sure this is the right place..

However concerning mix typed/non-typed I am not concerned about what the Graph 
provider would do but more about what the protocol can handle and hence I am in 
favour of having a protocol that can handle as much as possible in a consistent 
way, for example collections of typed and non typed values, as it is possible 
in a TinkerGraph. Which means, a VertexProperty can be a list of Strings and 
UUIDs, one doesn't need type, the other does.

> 
> Also, schemaless doesn't mean without any schema. While not having to
> define a schema up-front during a quickstart or early development
> makes life 

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com


On 2016-07-09 16:48 (+0100), Stephen Mallette  wrote: 
> With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> important that we have a solid, efficient, programming language neutral,
> lossless serialization format. Right now that format is GraphSON and it
> works for that purpose (ever more  so with 2.0). Given some discussion on
> the GraphSON 2.0 PR driven a bit by Robert Dale:
> 
> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> 
> I wonder if we shouldn't consider another IO format that has Gremlin
> Server/GLVs in mind. At this point I'm not suggesting anything specific -
> I'm just hanging the idea out for further discussion and brain storming.
> Thoughts?
> 

Hey, so I'm trying to gather all infos we have here in order to prepare to move 
forward with the implem of GraphSON 2.0, here's what I come up with : 

Things we have : 
- Type format.
- The structure in Jackson to implement our own type format.
- All non native Graph types are typed (except the domain specific types).

New things we need : 
- Types for domain specific objects.
- Types for all numeric values.
- Don't serialize empty fields (outV and stuff).

Things we consider changing :
- Type IDs convention. Before : Java simple class names. Now : starts with a 
"domain" like "gremlin" followed by the "type name", which is a lowercased type 
name (like "uuid", or "float", or "vertex"). Example : "gremlin:uuid".
- Type format ?

Am I missing something ?


Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com


On 2016-07-15 15:52 (+0100), 
"gallardo.kev...@gmail.com"<gallardo.kev...@gmail.com> wrote: 
> 
> 
> On 2016-07-15 14:44 (+0100), Robert Dale <robd...@gmail.com> wrote: 
> > It looks to me like a self-inflicted problem because the things that
> > are typed are already native to json so it's redundant.  And to go a
> > step further, I wouldn't consider the types to be 'correct' because
> > everything that is a HashMap is really a Vertex, Edge, or Property.
> > 
> > On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
> > <gallardo.kev...@gmail.com> wrote:
> > >
> > >
> > > On 2016-07-13 13:17 (+0100), Robert Dale <robd...@gmail.com> wrote:
> > >> Marko, I agree that empty object properties should not be represented.
> > >> I think if you saw that in an example then it was probably for
> > >> demonstration purposes.
> > >>
> > >> Kevin, can you expand on this comment:
> > >>
> > >> > the format you suggest would lead to the same inconsistencies as in 
> > >> > GraphSON 1.0.
> > >> > Since the type is at the same level than the data itself, whether the 
> > >> > container is an Array or an Object
> > >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> > >>
> > >> What exactly are the inconsistencies?  What is the problem in
> > >> determining an array or object?
> > >> This is a natural JSON array (or list): []
> > >> This is a natural JSON object: {}
> > >>
> > >> Type at the object level is a common pattern and supported feature of
> > >> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> > >> 'type' at the object level. Titan supports GeoJSON currently.  I
> > >> wonder if it would make sense to promote geometry to gremlin.
> > >>
> > >
> > > I wasn't probably clear enough, in my first email exposing my motivation 
> > > to improve GraphSON 1.0, one of the things I noticed was that according 
> > > to the enclosing element (either an Array or a Map), a type will either 
> > > be described as (respectively) an element of the Array, or a key/value 
> > > pair in a Map, you can see that in the "embedded types" example of the 
> > > Tinkerpop docs : 
> > > http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer
> > >  .
> > >
> > > There you can see that the type "java.util.ArrayList" is a simple element 
> > > of the enclosing array, but the "java.util.HashMap" type is a field of 
> > > the enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not 
> > > seem consistent to me and even though I know that Jackson handles it 
> > > well, it seems that we'd better provide a consistent enclosing format 
> > > that we know is fixed whatever the enclosed data is, to make the 
> > > automatic type detection for other parsers in other libraries/languages 
> > > easier. Does that make sense ?
> > >
> > >> We should probably start documenting a table of supported types. (If
> > >> there is one, please provide link)
> > >>
> > >> I wonder if it even makes sense to type numbers according to their
> > >> memory model. As objects, Byte, Short, and Integer occupy the same
> > >> space. Long isn't much more.  So in Java we're not saving much space.
> > >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> > >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> > >> have this concept.  Does anything in gremlin actually require this?
> > >> I'm thinking that this is only going to be relevant at the domain
> > >> model level. This way json native numbers can be used and not need
> > >> typing.
> > >>
> > >> Additionally, I think that all things that will be typed should always
> > >> be typed. For the use cases of injesting a saved graph from a file, it
> > >> can probably be assumed that the top-level objects are vertices since
> > >> the graph is vertex-centric and everything else follows naturally.
> > >> I'm not entirely sure what is required for submitting traversals to
> > >> gremlin server from GLV.  However, if this is used for the results
> > >> from gremlin server then the results could start with any one of path,
> > >> vertex, edge, property, verte

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com


On 2016-07-15 16:07 (+0100), 
"gallardo.kev...@gmail.com"<gallardo.kev...@gmail.com> wrote: 
> 
> 
> On 2016-07-15 15:52 (+0100), 
> "gallardo.kev...@gmail.com"<gallardo.kev...@gmail.com> wrote: 
> > 
> > 
> > On 2016-07-15 14:44 (+0100), Robert Dale <robd...@gmail.com> wrote: 
> > > It looks to me like a self-inflicted problem because the things that
> > > are typed are already native to json so it's redundant.  And to go a
> > > step further, I wouldn't consider the types to be 'correct' because
> > > everything that is a HashMap is really a Vertex, Edge, or Property.
> > > 
> > > On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
> > > <gallardo.kev...@gmail.com> wrote:
> > > >
> > > >
> > > > On 2016-07-13 13:17 (+0100), Robert Dale <robd...@gmail.com> wrote:
> > > >> Marko, I agree that empty object properties should not be represented.
> > > >> I think if you saw that in an example then it was probably for
> > > >> demonstration purposes.
> > > >>
> > > >> Kevin, can you expand on this comment:
> > > >>
> > > >> > the format you suggest would lead to the same inconsistencies as in 
> > > >> > GraphSON 1.0.
> > > >> > Since the type is at the same level than the data itself, whether 
> > > >> > the container is an Array or an Object
> > > >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> > > >>
> > > >> What exactly are the inconsistencies?  What is the problem in
> > > >> determining an array or object?
> > > >> This is a natural JSON array (or list): []
> > > >> This is a natural JSON object: {}
> > > >>
> > > >> Type at the object level is a common pattern and supported feature of
> > > >> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> > > >> 'type' at the object level. Titan supports GeoJSON currently.  I
> > > >> wonder if it would make sense to promote geometry to gremlin.
> > > >>
> > > >
> > > > I wasn't probably clear enough, in my first email exposing my 
> > > > motivation to improve GraphSON 1.0, one of the things I noticed was 
> > > > that according to the enclosing element (either an Array or a Map), a 
> > > > type will either be described as (respectively) an element of the 
> > > > Array, or a key/value pair in a Map, you can see that in the "embedded 
> > > > types" example of the Tinkerpop docs : 
> > > > http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer
> > > >  .
> > > >
> > > > There you can see that the type "java.util.ArrayList" is a simple 
> > > > element of the enclosing array, but the "java.util.HashMap" type is a 
> > > > field of the enclosing Map as {"@class" : "java.util.HashMap", ...}. 
> > > > This does not seem consistent to me and even though I know that Jackson 
> > > > handles it well, it seems that we'd better provide a consistent 
> > > > enclosing format that we know is fixed whatever the enclosed data is, 
> > > > to make the automatic type detection for other parsers in other 
> > > > libraries/languages easier. Does that make sense ?
> > > >
> > > >> We should probably start documenting a table of supported types. (If
> > > >> there is one, please provide link)
> > > >>
> > > >> I wonder if it even makes sense to type numbers according to their
> > > >> memory model. As objects, Byte, Short, and Integer occupy the same
> > > >> space. Long isn't much more.  So in Java we're not saving much space.
> > > >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> > > >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> > > >> have this concept.  Does anything in gremlin actually require this?
> > > >> I'm thinking that this is only going to be relevant at the domain
> > > >> model level. This way json native numbers can be used and not need
> > > >> typing.
> > > >>
> > > >> Additionally, I think that all things that will be typed should always
> > > >> be typed. For the use cases of injesting a saved graph from a file, it
> > 

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-14 Thread gallardo.kev...@gmail.com


On 2016-07-13 13:17 (+0100), Robert Dale  wrote: 
> Marko, I agree that empty object properties should not be represented.
> I think if you saw that in an example then it was probably for
> demonstration purposes.
> 
> Kevin, can you expand on this comment:
> 
> > the format you suggest would lead to the same inconsistencies as in 
> > GraphSON 1.0.
> > Since the type is at the same level than the data itself, whether the 
> > container is an Array or an Object
> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> 
> What exactly are the inconsistencies?  What is the problem in
> determining an array or object?
> This is a natural JSON array (or list): []
> This is a natural JSON object: {}
> 
> Type at the object level is a common pattern and supported feature of
> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> 'type' at the object level. Titan supports GeoJSON currently.  I
> wonder if it would make sense to promote geometry to gremlin.
> 

I wasn't probably clear enough, in my first email exposing my motivation to 
improve GraphSON 1.0, one of the things I noticed was that according to the 
enclosing element (either an Array or a Map), a type will either be described 
as (respectively) an element of the Array, or a key/value pair in a Map, you 
can see that in the "embedded types" example of the Tinkerpop docs : 
http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer . 

There you can see that the type "java.util.ArrayList" is a simple element of 
the enclosing array, but the "java.util.HashMap" type is a field of the 
enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not seem 
consistent to me and even though I know that Jackson handles it well, it seems 
that we'd better provide a consistent enclosing format that we know is fixed 
whatever the enclosed data is, to make the automatic type detection for other 
parsers in other libraries/languages easier. Does that make sense ?

> We should probably start documenting a table of supported types. (If
> there is one, please provide link)
> 
> I wonder if it even makes sense to type numbers according to their
> memory model. As objects, Byte, Short, and Integer occupy the same
> space. Long isn't much more.  So in Java we're not saving much space.
> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> have this concept.  Does anything in gremlin actually require this?
> I'm thinking that this is only going to be relevant at the domain
> model level. This way json native numbers can be used and not need
> typing.
> 
> Additionally, I think that all things that will be typed should always
> be typed. For the use cases of injesting a saved graph from a file, it
> can probably be assumed that the top-level objects are vertices since
> the graph is vertex-centric and everything else follows naturally.
> I'm not entirely sure what is required for submitting traversals to
> gremlin server from GLV.  However, if this is used for the results
> from gremlin server then the results could start with any one of path,
> vertex, edge, property, vertex property, etc. So you'll need that type
> data there.
> 
> -- 
> Robert Dale
> 
> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez  wrote:
> > Hi,
> >
> > I’m not following this PR too closely so what I might be saying is a 
> > already known/argued against/etc.
> >
> > 1. I think we should go with Robert Dale’s proposal of int32, 
> > int64, Vertex, uuid, etc. instead of Java class names.
> > 2. In Java we then have a Map for typecasting 
> > accordingly.
> > 3. This would make GraphSON 2.0 perfect for Bytecode serialization 
> > in TINKERPOP-1278.
> > 4. I think that if a Vertex, Edge, etc. doesn’t have properties, 
> > outV, etc. then don’t even have those fields in the representation.
> > 5. Most of the serialization back and forth will be ReferenceXXX 
> > elements and thus, don’t create more Maps/lists for no reason. — less 
> > chars.
> >
> > For me, my interests with this work is all about a language agnostic way of 
> > sending Gremlin traversal bytecode between different languages. This work 
> > is exactly what I am looking for.
> >
> > Thanks,
> > Marko.
> >
> > http://markorodriguez.com
> >
> >
> >
> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette  wrote:
> >>
> >> With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> >> important that we have a solid, efficient, programming language neutral,
> >> lossless serialization format. Right now that format is GraphSON and it
> >> works for that purpose (ever more  so with 2.0). Given some discussion on
> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >>
> >> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >>
> >> I wonder if