subject:"\[DISCUSS\] New IO format for GLVs\/Gremlin Server"

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-20 Thread gallardo.kev...@gmail.com



On 2016-07-19 22:28 (+0100), Marko Rodriguez  wrote: 
> Hi,
> 
> However, in general we just need an âobject mapper pattern.â For instance:
> 
> For any JSON object { } that has a @type field, the @type value maps to a 
> deserializer. Thus, while we need to be able to serialize/deserialize the 
> standard Vertex/Edge/VertexProperty/etc. the representation should be 
> generalized to support any registered @type.

Agree with that, we wouldn't have had the choice than adding deserializers for 
these types anyway with how Jackson works. I had also planned indeed to make 
the GraphSONTypeIdResolver - which is the component that handles the conversion 
"typeID" -> "Java Class" for deserialization and "Java Class" -> "typeID" for 
serialization - configurable for users.

> 
>   Java GraphSON serializer/deserializer registration:
>   
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONModule.java#L129-L147
>  
> 
> 
>   Python GraphSON serializer registration:
>   
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-python/src/main/jython/gremlin_python/process/graphson.py#L122-L127
>  
> 
> 
> People can register more @types as needed for their graph processorâs type 
> system.
> 
> Marko.
> 
> http://markorodriguez.com
> 
> 
> 
> > On Jul 19, 2016, at 12:55 PM, Marko Rodriguez  wrote:
> > 
> > We need:
> > 
> > Graph
> > Element
> > Vertex
> > Edge
> > VertexProperty
> > Property
> > Path
> > TraversalExplanation
> > TraversalMetrics
> > Traversal (i.e. Bytecode)
> > Traverser (object + bulk at minimum)
> > 
> > Marko.
> > 
> > http://markorodriguez.com
> > 
> > 
> > 
> >> On Jul 19, 2016, at 12:45 PM, Robert Dale  wrote:
> >> 
> >> There's also Path that can be returned from a query. It looks like
> >> GraphSON 1.0 handles this today in the REST API but it's not typed as
> >> a path.
> >> 
> >> On Tue, Jul 19, 2016 at 2:14 PM, gallardo.kev...@gmail.com
> >>  wrote:
> >>> 
> >>> 
> >>> On 2016-07-19 18:02 (+0100), Robert Dale  wrote:
>  - It seems redundant to nest a vertex or edge inside a type-value
>  object and is inconsistent with a VertexProperty.
>  - VertexProperty and (edge) Property are implicit types. I don't know
>  if this is ok. Could they ever be used outside of their parents where
>  they would need to be typed?
> >>> 
> >>> I agree with the VertexProperty remark. That's one last question I wanted 
> >>> to solve, if we go for typing Vertex and edges, do we include others? The 
> >>> full list I see then is : vertex/edge/vertexproperty/property/graph.
> >>> 
> >>> However I am not sure how useful it is to have more than Vertex and Edge. 
> >>> As, when deserializing a Vertex for example, there's no question as to 
> >>> what is in the "properties" field of the Vertex, there are necessarily 
> >>> only VertexProperties. However looking at the API, it seems like it is 
> >>> supported to write only a VertexProperty if one wants to (see 
> >>> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense 
> >>> to add the types for the elements of the list I described above. @stephen 
> >>> any thoughts about that ?
> >>> 
>  - Edges:
>  - is in/outVLabel new? Couldn't find it in the API or any examples of 
>  this.
>  - why not make inV/outV have proper vertices with labels (to satisfy
>  the case previous case) instead of just IDs? This would also be more
>  consistent with the API.
> >>> 
> >>> I haven't touched that part, it was in the format before. I believe this 
> >>> is a question for Stephen.
> >>> 
>  
>  Otherwise looks good!
> >>> 
> >>> Thanks for the feedback.
>  
>  On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
>   wrote:
> > 
> > 
> > On 2016-07-15 16:25 (+0100), 
> > "gallardo.kev...@gmail.com" wrote:
> >> 
> >> 
> >> On 2016-07-09 16:48 (+0100), Stephen Mallette  
> >> wrote:
> >>> With all the work on GLVs and the recent work on GraphSON 2.0, I 
> >>> think it's
> >>> important that we have a solid, efficient, programming language 
> >>> neutral,
> >>> lossless serialization format. Right now that format is GraphSON and 
> >>> it
> >>> works for that purpose (ever more  so with 2.0). Given some 
> >>> discussion on
> >>> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >>> 
> >>> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >>> 
> >>> I wonder if we shouldn't consider another I

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Marko Rodriguez

Hi,

However, in general we just need an “object mapper pattern.” For instance:

For any JSON object { } that has a @type field, the @type value maps to a 
deserializer. Thus, while we need to be able to serialize/deserialize the 
standard Vertex/Edge/VertexProperty/etc. the representation should be 
generalized to support any registered @type.

Java GraphSON serializer/deserializer registration:

https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONModule.java#L129-L147
 


Python GraphSON serializer registration:

https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-python/src/main/jython/gremlin_python/process/graphson.py#L122-L127
 


People can register more @types as needed for their graph processor’s type 
system.

Marko.

http://markorodriguez.com



> On Jul 19, 2016, at 12:55 PM, Marko Rodriguez  wrote:
> 
> We need:
> 
>   Graph
>   Element
>   Vertex
>   Edge
>   VertexProperty
>   Property
>   Path
>   TraversalExplanation
>   TraversalMetrics
>   Traversal (i.e. Bytecode)
>   Traverser (object + bulk at minimum)
> 
> Marko.
> 
> http://markorodriguez.com
> 
> 
> 
>> On Jul 19, 2016, at 12:45 PM, Robert Dale  wrote:
>> 
>> There's also Path that can be returned from a query. It looks like
>> GraphSON 1.0 handles this today in the REST API but it's not typed as
>> a path.
>> 
>> On Tue, Jul 19, 2016 at 2:14 PM, gallardo.kev...@gmail.com
>>  wrote:
>>> 
>>> 
>>> On 2016-07-19 18:02 (+0100), Robert Dale  wrote:
 - It seems redundant to nest a vertex or edge inside a type-value
 object and is inconsistent with a VertexProperty.
 - VertexProperty and (edge) Property are implicit types. I don't know
 if this is ok. Could they ever be used outside of their parents where
 they would need to be typed?
>>> 
>>> I agree with the VertexProperty remark. That's one last question I wanted 
>>> to solve, if we go for typing Vertex and edges, do we include others? The 
>>> full list I see then is : vertex/edge/vertexproperty/property/graph.
>>> 
>>> However I am not sure how useful it is to have more than Vertex and Edge. 
>>> As, when deserializing a Vertex for example, there's no question as to what 
>>> is in the "properties" field of the Vertex, there are necessarily only 
>>> VertexProperties. However looking at the API, it seems like it is supported 
>>> to write only a VertexProperty if one wants to (see 
>>> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense 
>>> to add the types for the elements of the list I described above. @stephen 
>>> any thoughts about that ?
>>> 
 - Edges:
 - is in/outVLabel new? Couldn't find it in the API or any examples of this.
 - why not make inV/outV have proper vertices with labels (to satisfy
 the case previous case) instead of just IDs? This would also be more
 consistent with the API.
>>> 
>>> I haven't touched that part, it was in the format before. I believe this is 
>>> a question for Stephen.
>>> 
 
 Otherwise looks good!
>>> 
>>> Thanks for the feedback.
 
 On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
  wrote:
> 
> 
> On 2016-07-15 16:25 (+0100), 
> "gallardo.kev...@gmail.com" wrote:
>> 
>> 
>> On 2016-07-09 16:48 (+0100), Stephen Mallette  
>> wrote:
>>> With all the work on GLVs and the recent work on GraphSON 2.0, I think 
>>> it's
>>> important that we have a solid, efficient, programming language neutral,
>>> lossless serialization format. Right now that format is GraphSON and it
>>> works for that purpose (ever more  so with 2.0). Given some discussion 
>>> on
>>> the GraphSON 2.0 PR driven a bit by Robert Dale:
>>> 
>>> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
>>> 
>>> I wonder if we shouldn't consider another IO format that has Gremlin
>>> Server/GLVs in mind. At this point I'm not suggesting anything specific 
>>> -
>>> I'm just hanging the idea out for further discussion and brain storming.
>>> Thoughts?
>>> 
>> 
>> Hey, so I'm trying to gather all infos we have here in order to prepare 
>> to move forward with the implem of GraphSON 2.0, here's what I come up 
>> with :
>> 
>> Things we have :
>> - Type format.
>> - The structure in Jackson to implement our own type format.
>> - All non native Graph types are typed (except the domain specific 
>> types).
>> 
>> New things we need :
>>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Stephen Mallette

ah - sorry - didn't follow that. that makes sense to me. inVLabel and
outVLabel are kinda awkward. +1 from me on that one.

On Tue, Jul 19, 2016 at 3:23 PM, Robert Dale  wrote:

> On Tue, Jul 19, 2016 at 3:13 PM, Stephen Mallette 
> wrote:
> >>
> >> > - VertexProperty and (edge) Property are implicit types. I don't know
> >> > if this is ok. Could they ever be used outside of their parents where
> >> > they would need to be typed?
> >>
> >> I agree with the VertexProperty remark. That's one last question I
> wanted
> >> to solve, if we go for typing Vertex and edges, do we include others?
> The
> >> full list I see then is : vertex/edge/vertexproperty/property/graph.
> >>
> >> However I am not sure how useful it is to have more than Vertex and
> Edge.
> >> As, when deserializing a Vertex for example, there's no question as to
> what
> >> is in the "properties" field of the Vertex, there are necessarily only
> >> VertexProperties. However looking at the API, it seems like it is
> supported
> >> to write only a VertexProperty if one wants to (see
> >> GraphWriter.writeVertexProperty()), so in that case, to me it makes
> sense
> >> to add the types for the elements of the list I described above.
> @stephen
> >> any thoughts about that ?
> >
> >
> > I guess we should type them to be consistent and because they might
> return
> > independently of a Vertex/Edge as Robert suggests.
> >
> >> - Edges:
> >> >   - is in/outVLabel new? Couldn't find it in the API or any examples
> of
> >> this.
> >> >   - why not make inV/outV have proper vertices with labels (to satisfy
> >> > the case previous case) instead of just IDs? This would also be more
> >> > consistent with the API.
> >>
> >> I haven't touched that part, it was in the format before. I believe this
> >> is a question for Stephen.
> >
> >
> > Returning a "proper" vertex for inV/outV would be nice but it's
> potentially
> > forcing the underlying graph database to pull a lot of data when the user
> > only requested an edge to be returned. I don't think we should go that
> far.
>
> By "proper" I meant an object (type: vertex) that would have the data
> that's already available - label, id.  No extra trips to the db. Just
> more intuitive packaging of that data.
>
> --
> Robert Dale
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Robert Dale

On Tue, Jul 19, 2016 at 3:13 PM, Stephen Mallette  wrote:
>>
>> > - VertexProperty and (edge) Property are implicit types. I don't know
>> > if this is ok. Could they ever be used outside of their parents where
>> > they would need to be typed?
>>
>> I agree with the VertexProperty remark. That's one last question I wanted
>> to solve, if we go for typing Vertex and edges, do we include others? The
>> full list I see then is : vertex/edge/vertexproperty/property/graph.
>>
>> However I am not sure how useful it is to have more than Vertex and Edge.
>> As, when deserializing a Vertex for example, there's no question as to what
>> is in the "properties" field of the Vertex, there are necessarily only
>> VertexProperties. However looking at the API, it seems like it is supported
>> to write only a VertexProperty if one wants to (see
>> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense
>> to add the types for the elements of the list I described above. @stephen
>> any thoughts about that ?
>
>
> I guess we should type them to be consistent and because they might return
> independently of a Vertex/Edge as Robert suggests.
>
>> - Edges:
>> >   - is in/outVLabel new? Couldn't find it in the API or any examples of
>> this.
>> >   - why not make inV/outV have proper vertices with labels (to satisfy
>> > the case previous case) instead of just IDs? This would also be more
>> > consistent with the API.
>>
>> I haven't touched that part, it was in the format before. I believe this
>> is a question for Stephen.
>
>
> Returning a "proper" vertex for inV/outV would be nice but it's potentially
> forcing the underlying graph database to pull a lot of data when the user
> only requested an edge to be returned. I don't think we should go that far.

By "proper" I meant an object (type: vertex) that would have the data
that's already available - label, id.  No extra trips to the db. Just
more intuitive packaging of that data.

-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Stephen Mallette

>
> > - VertexProperty and (edge) Property are implicit types. I don't know
> > if this is ok. Could they ever be used outside of their parents where
> > they would need to be typed?
>
> I agree with the VertexProperty remark. That's one last question I wanted
> to solve, if we go for typing Vertex and edges, do we include others? The
> full list I see then is : vertex/edge/vertexproperty/property/graph.
>
> However I am not sure how useful it is to have more than Vertex and Edge.
> As, when deserializing a Vertex for example, there's no question as to what
> is in the "properties" field of the Vertex, there are necessarily only
> VertexProperties. However looking at the API, it seems like it is supported
> to write only a VertexProperty if one wants to (see
> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense
> to add the types for the elements of the list I described above. @stephen
> any thoughts about that ?


I guess we should type them to be consistent and because they might return
independently of a Vertex/Edge as Robert suggests.

> - Edges:
> >   - is in/outVLabel new? Couldn't find it in the API or any examples of
> this.
> >   - why not make inV/outV have proper vertices with labels (to satisfy
> > the case previous case) instead of just IDs? This would also be more
> > consistent with the API.
>
> I haven't touched that part, it was in the format before. I believe this
> is a question for Stephen.


Returning a "proper" vertex for inV/outV would be nice but it's potentially
forcing the underlying graph database to pull a lot of data when the user
only requested an edge to be returned. I don't think we should go that far.


On Tue, Jul 19, 2016 at 2:14 PM, gallardo.kev...@gmail.com <
gallardo.kev...@gmail.com> wrote:

>
>
> On 2016-07-19 18:02 (+0100), Robert Dale  wrote:
> > - It seems redundant to nest a vertex or edge inside a type-value
> > object and is inconsistent with a VertexProperty.
> > - VertexProperty and (edge) Property are implicit types. I don't know
> > if this is ok. Could they ever be used outside of their parents where
> > they would need to be typed?
>
> I agree with the VertexProperty remark. That's one last question I wanted
> to solve, if we go for typing Vertex and edges, do we include others? The
> full list I see then is : vertex/edge/vertexproperty/property/graph.
>
> However I am not sure how useful it is to have more than Vertex and Edge.
> As, when deserializing a Vertex for example, there's no question as to what
> is in the "properties" field of the Vertex, there are necessarily only
> VertexProperties. However looking at the API, it seems like it is supported
> to write only a VertexProperty if one wants to (see
> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense
> to add the types for the elements of the list I described above. @stephen
> any thoughts about that ?
>
> > - Edges:
> >   - is in/outVLabel new? Couldn't find it in the API or any examples of
> this.
> >   - why not make inV/outV have proper vertices with labels (to satisfy
> > the case previous case) instead of just IDs? This would also be more
> > consistent with the API.
>
> I haven't touched that part, it was in the format before. I believe this
> is a question for Stephen.
>
> >
> > Otherwise looks good!
>
> Thanks for the feedback.
> >
> > On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
> >  wrote:
> > >
> > >
> > > On 2016-07-15 16:25 (+0100), "gallardo.kev...@gmail.com"<
> gallardo.kev...@gmail.com> wrote:
> > >>
> > >>
> > >> On 2016-07-09 16:48 (+0100), Stephen Mallette 
> wrote:
> > >> > With all the work on GLVs and the recent work on GraphSON 2.0, I
> think it's
> > >> > important that we have a solid, efficient, programming language
> neutral,
> > >> > lossless serialization format. Right now that format is GraphSON
> and it
> > >> > works for that purpose (ever more  so with 2.0). Given some
> discussion on
> > >> > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > >> >
> > >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > >> >
> > >> > I wonder if we shouldn't consider another IO format that has Gremlin
> > >> > Server/GLVs in mind. At this point I'm not suggesting anything
> specific -
> > >> > I'm just hanging the idea out for further discussion and brain
> storming.
> > >> > Thoughts?
> > >> >
> > >>
> > >> Hey, so I'm trying to gather all infos we have here in order to
> prepare to move forward with the implem of GraphSON 2.0, here's what I come
> up with :
> > >>
> > >> Things we have :
> > >> - Type format.
> > >> - The structure in Jackson to implement our own type format.
> > >> - All non native Graph types are typed (except the domain specific
> types).
> > >>
> > >> New things we need :
> > >> - Types for domain specific objects.
> > >> - Types for all numeric values.
> > >> - Don't serialize empty fields (outV and stuff).
> > >>
> > >> Things we consider changing :
> > >> - Type IDs con

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Marko Rodriguez

We need:

Graph
Element
Vertex
Edge
VertexProperty
Property
Path
TraversalExplanation
TraversalMetrics
Traversal (i.e. Bytecode)
Traverser (object + bulk at minimum)

Marko.

http://markorodriguez.com



> On Jul 19, 2016, at 12:45 PM, Robert Dale  wrote:
> 
> There's also Path that can be returned from a query. It looks like
> GraphSON 1.0 handles this today in the REST API but it's not typed as
> a path.
> 
> On Tue, Jul 19, 2016 at 2:14 PM, gallardo.kev...@gmail.com
>  wrote:
>> 
>> 
>> On 2016-07-19 18:02 (+0100), Robert Dale  wrote:
>>> - It seems redundant to nest a vertex or edge inside a type-value
>>> object and is inconsistent with a VertexProperty.
>>> - VertexProperty and (edge) Property are implicit types. I don't know
>>> if this is ok. Could they ever be used outside of their parents where
>>> they would need to be typed?
>> 
>> I agree with the VertexProperty remark. That's one last question I wanted to 
>> solve, if we go for typing Vertex and edges, do we include others? The full 
>> list I see then is : vertex/edge/vertexproperty/property/graph.
>> 
>> However I am not sure how useful it is to have more than Vertex and Edge. 
>> As, when deserializing a Vertex for example, there's no question as to what 
>> is in the "properties" field of the Vertex, there are necessarily only 
>> VertexProperties. However looking at the API, it seems like it is supported 
>> to write only a VertexProperty if one wants to (see 
>> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense to 
>> add the types for the elements of the list I described above. @stephen any 
>> thoughts about that ?
>> 
>>> - Edges:
>>>  - is in/outVLabel new? Couldn't find it in the API or any examples of this.
>>>  - why not make inV/outV have proper vertices with labels (to satisfy
>>> the case previous case) instead of just IDs? This would also be more
>>> consistent with the API.
>> 
>> I haven't touched that part, it was in the format before. I believe this is 
>> a question for Stephen.
>> 
>>> 
>>> Otherwise looks good!
>> 
>> Thanks for the feedback.
>>> 
>>> On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
>>>  wrote:
 
 
 On 2016-07-15 16:25 (+0100), 
 "gallardo.kev...@gmail.com" wrote:
> 
> 
> On 2016-07-09 16:48 (+0100), Stephen Mallette  
> wrote:
>> With all the work on GLVs and the recent work on GraphSON 2.0, I think 
>> it's
>> important that we have a solid, efficient, programming language neutral,
>> lossless serialization format. Right now that format is GraphSON and it
>> works for that purpose (ever more  so with 2.0). Given some discussion on
>> the GraphSON 2.0 PR driven a bit by Robert Dale:
>> 
>> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
>> 
>> I wonder if we shouldn't consider another IO format that has Gremlin
>> Server/GLVs in mind. At this point I'm not suggesting anything specific -
>> I'm just hanging the idea out for further discussion and brain storming.
>> Thoughts?
>> 
> 
> Hey, so I'm trying to gather all infos we have here in order to prepare 
> to move forward with the implem of GraphSON 2.0, here's what I come up 
> with :
> 
> Things we have :
> - Type format.
> - The structure in Jackson to implement our own type format.
> - All non native Graph types are typed (except the domain specific types).
> 
> New things we need :
> - Types for domain specific objects.
> - Types for all numeric values.
> - Don't serialize empty fields (outV and stuff).
> 
> Things we consider changing :
> - Type IDs convention. Before : Java simple class names. Now : starts 
> with a "domain" like "gremlin" followed by the "type name", which is a 
> lowercased type name (like "uuid", or "float", or "vertex"). Example : 
> "gremlin:uuid".
> - Type format ?
> 
> Am I missing something ?
> 
 Hey,
 
 So I've made a few changes in the code from the original GraphSON 2.0, 
 with the objectives described above, the code is still messy but I just 
 thought I'd share some samples to start getting into the work and gather 
 some feedback.
 
 In the example I've created a TinkerGraph with 2 vertices connected by an 
 edge. The graph is serialized as a TinkerGraph.
 The samples are there : 
 https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
 
 Any feedback appreciated.
>>> 
>>> 
>>> 
>>> --
>>> Robert Dale
>>> 
> 
> 
> 
> -- 
> Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Robert Dale

There's also Path that can be returned from a query. It looks like
GraphSON 1.0 handles this today in the REST API but it's not typed as
a path.

On Tue, Jul 19, 2016 at 2:14 PM, gallardo.kev...@gmail.com
 wrote:
>
>
> On 2016-07-19 18:02 (+0100), Robert Dale  wrote:
>> - It seems redundant to nest a vertex or edge inside a type-value
>> object and is inconsistent with a VertexProperty.
>> - VertexProperty and (edge) Property are implicit types. I don't know
>> if this is ok. Could they ever be used outside of their parents where
>> they would need to be typed?
>
> I agree with the VertexProperty remark. That's one last question I wanted to 
> solve, if we go for typing Vertex and edges, do we include others? The full 
> list I see then is : vertex/edge/vertexproperty/property/graph.
>
> However I am not sure how useful it is to have more than Vertex and Edge. As, 
> when deserializing a Vertex for example, there's no question as to what is in 
> the "properties" field of the Vertex, there are necessarily only 
> VertexProperties. However looking at the API, it seems like it is supported 
> to write only a VertexProperty if one wants to (see 
> GraphWriter.writeVertexProperty()), so in that case, to me it makes sense to 
> add the types for the elements of the list I described above. @stephen any 
> thoughts about that ?
>
>> - Edges:
>>   - is in/outVLabel new? Couldn't find it in the API or any examples of this.
>>   - why not make inV/outV have proper vertices with labels (to satisfy
>> the case previous case) instead of just IDs? This would also be more
>> consistent with the API.
>
> I haven't touched that part, it was in the format before. I believe this is a 
> question for Stephen.
>
>>
>> Otherwise looks good!
>
> Thanks for the feedback.
>>
>> On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
>>  wrote:
>> >
>> >
>> > On 2016-07-15 16:25 (+0100), 
>> > "gallardo.kev...@gmail.com" wrote:
>> >>
>> >>
>> >> On 2016-07-09 16:48 (+0100), Stephen Mallette  
>> >> wrote:
>> >> > With all the work on GLVs and the recent work on GraphSON 2.0, I think 
>> >> > it's
>> >> > important that we have a solid, efficient, programming language neutral,
>> >> > lossless serialization format. Right now that format is GraphSON and it
>> >> > works for that purpose (ever more  so with 2.0). Given some discussion 
>> >> > on
>> >> > the GraphSON 2.0 PR driven a bit by Robert Dale:
>> >> >
>> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
>> >> >
>> >> > I wonder if we shouldn't consider another IO format that has Gremlin
>> >> > Server/GLVs in mind. At this point I'm not suggesting anything specific 
>> >> > -
>> >> > I'm just hanging the idea out for further discussion and brain storming.
>> >> > Thoughts?
>> >> >
>> >>
>> >> Hey, so I'm trying to gather all infos we have here in order to prepare 
>> >> to move forward with the implem of GraphSON 2.0, here's what I come up 
>> >> with :
>> >>
>> >> Things we have :
>> >> - Type format.
>> >> - The structure in Jackson to implement our own type format.
>> >> - All non native Graph types are typed (except the domain specific types).
>> >>
>> >> New things we need :
>> >> - Types for domain specific objects.
>> >> - Types for all numeric values.
>> >> - Don't serialize empty fields (outV and stuff).
>> >>
>> >> Things we consider changing :
>> >> - Type IDs convention. Before : Java simple class names. Now : starts 
>> >> with a "domain" like "gremlin" followed by the "type name", which is a 
>> >> lowercased type name (like "uuid", or "float", or "vertex"). Example : 
>> >> "gremlin:uuid".
>> >> - Type format ?
>> >>
>> >> Am I missing something ?
>> >>
>> > Hey,
>> >
>> > So I've made a few changes in the code from the original GraphSON 2.0, 
>> > with the objectives described above, the code is still messy but I just 
>> > thought I'd share some samples to start getting into the work and gather 
>> > some feedback.
>> >
>> > In the example I've created a TinkerGraph with 2 vertices connected by an 
>> > edge. The graph is serialized as a TinkerGraph.
>> > The samples are there : 
>> > https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
>> >
>> > Any feedback appreciated.
>>
>>
>>
>> --
>> Robert Dale
>>



-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread gallardo.kev...@gmail.com



On 2016-07-19 18:02 (+0100), Robert Dale  wrote: 
> - It seems redundant to nest a vertex or edge inside a type-value
> object and is inconsistent with a VertexProperty.
> - VertexProperty and (edge) Property are implicit types. I don't know
> if this is ok. Could they ever be used outside of their parents where
> they would need to be typed?

I agree with the VertexProperty remark. That's one last question I wanted to 
solve, if we go for typing Vertex and edges, do we include others? The full 
list I see then is : vertex/edge/vertexproperty/property/graph.

However I am not sure how useful it is to have more than Vertex and Edge. As, 
when deserializing a Vertex for example, there's no question as to what is in 
the "properties" field of the Vertex, there are necessarily only 
VertexProperties. However looking at the API, it seems like it is supported to 
write only a VertexProperty if one wants to (see 
GraphWriter.writeVertexProperty()), so in that case, to me it makes sense to 
add the types for the elements of the list I described above. @stephen any 
thoughts about that ?

> - Edges:
>   - is in/outVLabel new? Couldn't find it in the API or any examples of this.
>   - why not make inV/outV have proper vertices with labels (to satisfy
> the case previous case) instead of just IDs? This would also be more
> consistent with the API.

I haven't touched that part, it was in the format before. I believe this is a 
question for Stephen.

> 
> Otherwise looks good!

Thanks for the feedback.
> 
> On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
>  wrote:
> >
> >
> > On 2016-07-15 16:25 (+0100), 
> > "gallardo.kev...@gmail.com" wrote:
> >>
> >>
> >> On 2016-07-09 16:48 (+0100), Stephen Mallette  wrote:
> >> > With all the work on GLVs and the recent work on GraphSON 2.0, I think 
> >> > it's
> >> > important that we have a solid, efficient, programming language neutral,
> >> > lossless serialization format. Right now that format is GraphSON and it
> >> > works for that purpose (ever more  so with 2.0). Given some discussion on
> >> > the GraphSON 2.0 PR driven a bit by Robert Dale:
> >> >
> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >> >
> >> > I wonder if we shouldn't consider another IO format that has Gremlin
> >> > Server/GLVs in mind. At this point I'm not suggesting anything specific -
> >> > I'm just hanging the idea out for further discussion and brain storming.
> >> > Thoughts?
> >> >
> >>
> >> Hey, so I'm trying to gather all infos we have here in order to prepare to 
> >> move forward with the implem of GraphSON 2.0, here's what I come up with :
> >>
> >> Things we have :
> >> - Type format.
> >> - The structure in Jackson to implement our own type format.
> >> - All non native Graph types are typed (except the domain specific types).
> >>
> >> New things we need :
> >> - Types for domain specific objects.
> >> - Types for all numeric values.
> >> - Don't serialize empty fields (outV and stuff).
> >>
> >> Things we consider changing :
> >> - Type IDs convention. Before : Java simple class names. Now : starts with 
> >> a "domain" like "gremlin" followed by the "type name", which is a 
> >> lowercased type name (like "uuid", or "float", or "vertex"). Example : 
> >> "gremlin:uuid".
> >> - Type format ?
> >>
> >> Am I missing something ?
> >>
> > Hey,
> >
> > So I've made a few changes in the code from the original GraphSON 2.0, with 
> > the objectives described above, the code is still messy but I just thought 
> > I'd share some samples to start getting into the work and gather some 
> > feedback.
> >
> > In the example I've created a TinkerGraph with 2 vertices connected by an 
> > edge. The graph is serialized as a TinkerGraph.
> > The samples are there : 
> > https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
> >
> > Any feedback appreciated.
> 
> 
> 
> -- 
> Robert Dale
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread gallardo.kev...@gmail.com



On 2016-07-19 17:47 (+0100), Stephen Mallette  wrote: 
> it should - properties are a Map of Lists of Property values.
> 
> On Tue, Jul 19, 2016 at 12:45 PM, Dylan Millikin 
> wrote:
> 
> > Quick question which is probably handled automatically but is this working
> > with multiple cardinalities on properties?
> >
> > On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com <
> > gallardo.kev...@gmail.com> wrote:
> >
> > >
> > >
> > > On 2016-07-15 16:25 (+0100), "gallardo.kev...@gmail.com"<
> > > gallardo.kev...@gmail.com> wrote:
> > > >
> > > >
> > > > On 2016-07-09 16:48 (+0100), Stephen Mallette 
> > > wrote:
> > > > > With all the work on GLVs and the recent work on GraphSON 2.0, I
> > think
> > > it's
> > > > > important that we have a solid, efficient, programming language
> > > neutral,
> > > > > lossless serialization format. Right now that format is GraphSON and
> > it
> > > > > works for that purpose (ever more  so with 2.0). Given some
> > discussion
> > > on
> > > > > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > > > >
> > > > > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > > > >
> > > > > I wonder if we shouldn't consider another IO format that has Gremlin
> > > > > Server/GLVs in mind. At this point I'm not suggesting anything
> > > specific -
> > > > > I'm just hanging the idea out for further discussion and brain
> > > storming.
> > > > > Thoughts?
> > > > >
> > > >
> > > > Hey, so I'm trying to gather all infos we have here in order to prepare
> > > to move forward with the implem of GraphSON 2.0, here's what I come up
> > with
> > > :
> > > >
> > > > Things we have :
> > > > - Type format.
> > > > - The structure in Jackson to implement our own type format.
> > > > - All non native Graph types are typed (except the domain specific
> > > types).
> > > >
> > > > New things we need :
> > > > - Types for domain specific objects.
> > > > - Types for all numeric values.
> > > > - Don't serialize empty fields (outV and stuff).
> > > >
> > > > Things we consider changing :
> > > > - Type IDs convention. Before : Java simple class names. Now : starts
> > > with a "domain" like "gremlin" followed by the "type name", which is a
> > > lowercased type name (like "uuid", or "float", or "vertex"). Example :
> > > "gremlin:uuid".
> > > > - Type format ?
> > > >
> > > > Am I missing something ?
> > > >
> > > Hey,
> > >
> > > So I've made a few changes in the code from the original GraphSON 2.0,
> > > with the objectives described above, the code is still messy but I just
> > > thought I'd share some samples to start getting into the work and gather
> > > some feedback.
> > >
> > > In the example I've created a TinkerGraph with 2 vertices connected by an
> > > edge. The graph is serialized as a TinkerGraph.
> > > The samples are there :
> > > https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
> > >
> > > Any feedback appreciated.
> > >
> >
> 
I confirm, I didn't change anything in that section.

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Robert Dale

- It seems redundant to nest a vertex or edge inside a type-value
object and is inconsistent with a VertexProperty.
- VertexProperty and (edge) Property are implicit types. I don't know
if this is ok. Could they ever be used outside of their parents where
they would need to be typed?
- Edges:
  - is in/outVLabel new? Couldn't find it in the API or any examples of this.
  - why not make inV/outV have proper vertices with labels (to satisfy
the case previous case) instead of just IDs? This would also be more
consistent with the API.

Otherwise looks good!

On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com
 wrote:
>
>
> On 2016-07-15 16:25 (+0100), 
> "gallardo.kev...@gmail.com" wrote:
>>
>>
>> On 2016-07-09 16:48 (+0100), Stephen Mallette  wrote:
>> > With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
>> > important that we have a solid, efficient, programming language neutral,
>> > lossless serialization format. Right now that format is GraphSON and it
>> > works for that purpose (ever more  so with 2.0). Given some discussion on
>> > the GraphSON 2.0 PR driven a bit by Robert Dale:
>> >
>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
>> >
>> > I wonder if we shouldn't consider another IO format that has Gremlin
>> > Server/GLVs in mind. At this point I'm not suggesting anything specific -
>> > I'm just hanging the idea out for further discussion and brain storming.
>> > Thoughts?
>> >
>>
>> Hey, so I'm trying to gather all infos we have here in order to prepare to 
>> move forward with the implem of GraphSON 2.0, here's what I come up with :
>>
>> Things we have :
>> - Type format.
>> - The structure in Jackson to implement our own type format.
>> - All non native Graph types are typed (except the domain specific types).
>>
>> New things we need :
>> - Types for domain specific objects.
>> - Types for all numeric values.
>> - Don't serialize empty fields (outV and stuff).
>>
>> Things we consider changing :
>> - Type IDs convention. Before : Java simple class names. Now : starts with a 
>> "domain" like "gremlin" followed by the "type name", which is a lowercased 
>> type name (like "uuid", or "float", or "vertex"). Example : "gremlin:uuid".
>> - Type format ?
>>
>> Am I missing something ?
>>
> Hey,
>
> So I've made a few changes in the code from the original GraphSON 2.0, with 
> the objectives described above, the code is still messy but I just thought 
> I'd share some samples to start getting into the work and gather some 
> feedback.
>
> In the example I've created a TinkerGraph with 2 vertices connected by an 
> edge. The graph is serialized as a TinkerGraph.
> The samples are there : 
> https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
>
> Any feedback appreciated.



-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Stephen Mallette

it should - properties are a Map of Lists of Property values.

On Tue, Jul 19, 2016 at 12:45 PM, Dylan Millikin 
wrote:

> Quick question which is probably handled automatically but is this working
> with multiple cardinalities on properties?
>
> On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com <
> gallardo.kev...@gmail.com> wrote:
>
> >
> >
> > On 2016-07-15 16:25 (+0100), "gallardo.kev...@gmail.com"<
> > gallardo.kev...@gmail.com> wrote:
> > >
> > >
> > > On 2016-07-09 16:48 (+0100), Stephen Mallette 
> > wrote:
> > > > With all the work on GLVs and the recent work on GraphSON 2.0, I
> think
> > it's
> > > > important that we have a solid, efficient, programming language
> > neutral,
> > > > lossless serialization format. Right now that format is GraphSON and
> it
> > > > works for that purpose (ever more  so with 2.0). Given some
> discussion
> > on
> > > > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > > >
> > > > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > > >
> > > > I wonder if we shouldn't consider another IO format that has Gremlin
> > > > Server/GLVs in mind. At this point I'm not suggesting anything
> > specific -
> > > > I'm just hanging the idea out for further discussion and brain
> > storming.
> > > > Thoughts?
> > > >
> > >
> > > Hey, so I'm trying to gather all infos we have here in order to prepare
> > to move forward with the implem of GraphSON 2.0, here's what I come up
> with
> > :
> > >
> > > Things we have :
> > > - Type format.
> > > - The structure in Jackson to implement our own type format.
> > > - All non native Graph types are typed (except the domain specific
> > types).
> > >
> > > New things we need :
> > > - Types for domain specific objects.
> > > - Types for all numeric values.
> > > - Don't serialize empty fields (outV and stuff).
> > >
> > > Things we consider changing :
> > > - Type IDs convention. Before : Java simple class names. Now : starts
> > with a "domain" like "gremlin" followed by the "type name", which is a
> > lowercased type name (like "uuid", or "float", or "vertex"). Example :
> > "gremlin:uuid".
> > > - Type format ?
> > >
> > > Am I missing something ?
> > >
> > Hey,
> >
> > So I've made a few changes in the code from the original GraphSON 2.0,
> > with the objectives described above, the code is still messy but I just
> > thought I'd share some samples to start getting into the work and gather
> > some feedback.
> >
> > In the example I've created a TinkerGraph with 2 vertices connected by an
> > edge. The graph is serialized as a TinkerGraph.
> > The samples are there :
> > https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
> >
> > Any feedback appreciated.
> >
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread Dylan Millikin

Quick question which is probably handled automatically but is this working
with multiple cardinalities on properties?

On Tue, Jul 19, 2016 at 12:05 PM, gallardo.kev...@gmail.com <
gallardo.kev...@gmail.com> wrote:

>
>
> On 2016-07-15 16:25 (+0100), "gallardo.kev...@gmail.com"<
> gallardo.kev...@gmail.com> wrote:
> >
> >
> > On 2016-07-09 16:48 (+0100), Stephen Mallette 
> wrote:
> > > With all the work on GLVs and the recent work on GraphSON 2.0, I think
> it's
> > > important that we have a solid, efficient, programming language
> neutral,
> > > lossless serialization format. Right now that format is GraphSON and it
> > > works for that purpose (ever more  so with 2.0). Given some discussion
> on
> > > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > >
> > > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > >
> > > I wonder if we shouldn't consider another IO format that has Gremlin
> > > Server/GLVs in mind. At this point I'm not suggesting anything
> specific -
> > > I'm just hanging the idea out for further discussion and brain
> storming.
> > > Thoughts?
> > >
> >
> > Hey, so I'm trying to gather all infos we have here in order to prepare
> to move forward with the implem of GraphSON 2.0, here's what I come up with
> :
> >
> > Things we have :
> > - Type format.
> > - The structure in Jackson to implement our own type format.
> > - All non native Graph types are typed (except the domain specific
> types).
> >
> > New things we need :
> > - Types for domain specific objects.
> > - Types for all numeric values.
> > - Don't serialize empty fields (outV and stuff).
> >
> > Things we consider changing :
> > - Type IDs convention. Before : Java simple class names. Now : starts
> with a "domain" like "gremlin" followed by the "type name", which is a
> lowercased type name (like "uuid", or "float", or "vertex"). Example :
> "gremlin:uuid".
> > - Type format ?
> >
> > Am I missing something ?
> >
> Hey,
>
> So I've made a few changes in the code from the original GraphSON 2.0,
> with the objectives described above, the code is still messy but I just
> thought I'd share some samples to start getting into the work and gather
> some feedback.
>
> In the example I've created a TinkerGraph with 2 vertices connected by an
> edge. The graph is serialized as a TinkerGraph.
> The samples are there :
> https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60
>
> Any feedback appreciated.
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-19 Thread gallardo.kev...@gmail.com



On 2016-07-15 16:25 (+0100), 
"gallardo.kev...@gmail.com" wrote: 
> 
> 
> On 2016-07-09 16:48 (+0100), Stephen Mallette  wrote: 
> > With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> > important that we have a solid, efficient, programming language neutral,
> > lossless serialization format. Right now that format is GraphSON and it
> > works for that purpose (ever more  so with 2.0). Given some discussion on
> > the GraphSON 2.0 PR driven a bit by Robert Dale:
> > 
> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> > 
> > I wonder if we shouldn't consider another IO format that has Gremlin
> > Server/GLVs in mind. At this point I'm not suggesting anything specific -
> > I'm just hanging the idea out for further discussion and brain storming.
> > Thoughts?
> > 
> 
> Hey, so I'm trying to gather all infos we have here in order to prepare to 
> move forward with the implem of GraphSON 2.0, here's what I come up with : 
> 
> Things we have : 
> - Type format.
> - The structure in Jackson to implement our own type format.
> - All non native Graph types are typed (except the domain specific types).
> 
> New things we need : 
> - Types for domain specific objects.
> - Types for all numeric values.
> - Don't serialize empty fields (outV and stuff).
> 
> Things we consider changing :
> - Type IDs convention. Before : Java simple class names. Now : starts with a 
> "domain" like "gremlin" followed by the "type name", which is a lowercased 
> type name (like "uuid", or "float", or "vertex"). Example : "gremlin:uuid".
> - Type format ?
> 
> Am I missing something ?
> 
Hey,

So I've made a few changes in the code from the original GraphSON 2.0, with the 
objectives described above, the code is still messy but I just thought I'd 
share some samples to start getting into the work and gather some feedback.

In the example I've created a TinkerGraph with 2 vertices connected by an edge. 
The graph is serialized as a TinkerGraph.
The samples are there : 
https://gist.github.com/newkek/97da94342bc32e571cf4a0ba1018df60

Any feedback appreciated.

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-18 Thread Robert Dale

On Mon, Jul 18, 2016 at 6:29 AM, gallardo.kev...@gmail.com
 wrote:
>
> On 2016-07-15 21:32 (+0100), Robert Dale  wrote:
>> Responding to Marko and Kevin...
[...]
>>
>> Kevin wrote:
>> >> Correct, these types weren't relevant... I only wanted to show you the 
>> >> format...
>> > However, I don't manage to understand the structure behind the format you 
>> > suggest, and I don't manage to establish a clear explicit representation 
>> > in my mind, regarding the example you provided in the TP-1274 PR. Could 
>> > you please give an example of how you would imagine the serialized JSON of 
>> > :
>> > - an example list of typed values, like List
>> > - an example list of typed and untyped values, like a list with UUIDs and 
>> > booleans
>> > - an example map of typed and untyped values
>> >
>> > How would you define that format in a general way ? Like what I did when 
>> > saying
>> > "- untyped : value
>> > - typed : {"@type", "typeName", "value" : value}"
>> >
>> > Just trying your point better.
>> > Also what are the downsides you see with the format suggested above ?
>>
>> The original format was in a list. I must have missed where you
>> accepted this format. In any case, like I originally stated, if you
>> want strong-typing, then _everything_ must be an _object_.
>>
>> Here's an example of non-typed:
>> https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655
>> - native json only
>>
>> Here's strongly typed:
>> https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a
>> - set (as an object), list (as an object), mixed-type lists, etc
>>
>
> OK, glad to see your revised version of the format is the exact same I 
> defined initially. I think we're on the same page here now. Except one thing, 
> it seems like the type information for vertex is not consistent with the 
> rest, if as you say if "everything is an object", then it would be like this 
> : https://gist.github.com/newkek/2d748dc59029f01af18b2a0e80494a31 .
> However, strong typing does not necessarily mean to me that there needs to be 
> a type metadata if the type is already properly handled by JSON. I.e. I don't 
> see the necessity to add type information for data like boolean. There is no 
> ambiguity possible.

Vertex (etc) is already an object so no it doesn't need to be nested
inside another object. The "type, value" pattern is primarily for
scalars but can also be used to differentiate collections - sets,
lists, arrays, etc. I got carried away with typing on boolean.  I
think the last item I disagree on is having a default type for
integers. I think they should all be typed. Otherwise, I agree we're
on the same page.

[...]
> No, in DSE Graph, the schema has to be defined upfront and does not depend on 
> the first element inserted. But I'm not the best person to talk about that 
> and I'm not sure this is the right place..

Specifically, "developer" mode allows this. "Production" mode requires schema.

-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-18 Thread gallardo.kev...@gmail.com



On 2016-07-15 21:32 (+0100), Robert Dale  wrote: 
> Responding to Marko and Kevin...
> 
> Marko wrote:
> > SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In 
> > terms of numbers, I think, unfortunately, we have to stick with int32, 
> > int64, float, double, etc. given graph database providers and their type 
> > systems. Its not about the Gremlin traversal API, its more about provider 
> > schemas. has(âsomeNumberâ,12L) vs. has(âsomeNumberâ,12).
> 
> I call the above behavior a bug or a peculiarity of Titan; it clings
> to a java object idiom. On the other hand, DSE graph exhibits expected
> behavior (as does IBM Graph, Neo4j.)  I know of no other query
> language that behaves like this - e.g. SQL, CassandraQL, JPQL, JOOQ
> (the gremlin of sql).  Typically the underlying driver/provider does
> the "right" thing (or doesn't).  Again, take UUID in gremlin, I can
> pass a string.  The underlying driver seems to convert it to UUID, I
> don't have to provide an UUID object.  This seems inconsistent.
> Either it's doing strong typing or not.  Which is it??
> 
> IMO, the query language should be abstracted from the storage schema.
> And I think this is where we have the impedance mismatch in this
> thread.  What gremlin is really acting like in addition to query
> language is an Object Graph Mapper (like an ORM).  It's playing two
> roles. So I'm also arguing that it should have a single
> responsibility. Yes, I've said this before. But maybe it changes
> things too drastically.  Maybe there are aspects of gremlin that
> actually require strong typing. I don't know. I haven't run into them.
> On to the next item...
> 
> Kevin wrote:
> >> Correct, these types weren't relevant... I only wanted to show you the 
> >> format...
> > However, I don't manage to understand the structure behind the format you 
> > suggest, and I don't manage to establish a clear explicit representation in 
> > my mind, regarding the example you provided in the TP-1274 PR. Could you 
> > please give an example of how you would imagine the serialized JSON of :
> > - an example list of typed values, like List
> > - an example list of typed and untyped values, like a list with UUIDs and 
> > booleans
> > - an example map of typed and untyped values
> >
> > How would you define that format in a general way ? Like what I did when 
> > saying
> > "- untyped : value
> > - typed : {"@type", "typeName", "value" : value}"
> >
> > Just trying your point better.
> > Also what are the downsides you see with the format suggested above ?
> 
> The original format was in a list. I must have missed where you
> accepted this format. In any case, like I originally stated, if you
> want strong-typing, then _everything_ must be an _object_.
> 
> Here's an example of non-typed:
> https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655
> - native json only
> 
> Here's strongly typed:
> https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a
> - set (as an object), list (as an object), mixed-type lists, etc
> 

OK, glad to see your revised version of the format is the exact same I defined 
initially. I think we're on the same page here now. Except one thing, it seems 
like the type information for vertex is not consistent with the rest, if as you 
say if "everything is an object", then it would be like this : 
https://gist.github.com/newkek/2d748dc59029f01af18b2a0e80494a31 .
However, strong typing does not necessarily mean to me that there needs to be a 
type metadata if the type is already properly handled by JSON. I.e. I don't see 
the necessity to add type information for data like boolean. There is no 
ambiguity possible.

> Let me add that while there's no strict definition of schemaless, it
> was not necessarily intended to include having mixed data types for a
> single field. This is a really bad idea. Experts warn against this.
> Most NoSQL databases don't even support this. You will probably die if
> you use it. The default behavior for DSE graph, IBM graph, and even
> Titan is to create the schema based on the first type inserted.  It
> will complain if any subsequent type is different.

No, in DSE Graph, the schema has to be defined upfront and does not depend on 
the first element inserted. But I'm not the best person to talk about that and 
I'm not sure this is the right place..

However concerning mix typed/non-typed I am not concerned about what the Graph 
provider would do but more about what the protocol can handle and hence I am in 
favour of having a protocol that can handle as much as possible in a consistent 
way, for example collections of typed and non typed values, as it is possible 
in a TinkerGraph. Which means, a VertexProperty can be a list of Strings and 
UUIDs, one doesn't need type, the other does.

> 
> Also, schemaless doesn't mean without any schema. While not having to
> define a schema up-front during a quickstart or early development
> makes life easier, no one doing a

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread Robert Dale

Responding to Marko and Kevin...

Marko wrote:
> SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In 
> terms of numbers, I think, unfortunately, we have to stick with int32, int64, 
> float, double, etc. given graph database providers and their type systems. 
> Its not about the Gremlin traversal API, its more about provider schemas. 
> has(“someNumber”,12L) vs. has(“someNumber”,12).

I call the above behavior a bug or a peculiarity of Titan; it clings
to a java object idiom. On the other hand, DSE graph exhibits expected
behavior (as does IBM Graph, Neo4j.)  I know of no other query
language that behaves like this - e.g. SQL, CassandraQL, JPQL, JOOQ
(the gremlin of sql).  Typically the underlying driver/provider does
the "right" thing (or doesn't).  Again, take UUID in gremlin, I can
pass a string.  The underlying driver seems to convert it to UUID, I
don't have to provide an UUID object.  This seems inconsistent.
Either it's doing strong typing or not.  Which is it??

IMO, the query language should be abstracted from the storage schema.
And I think this is where we have the impedance mismatch in this
thread.  What gremlin is really acting like in addition to query
language is an Object Graph Mapper (like an ORM).  It's playing two
roles. So I'm also arguing that it should have a single
responsibility. Yes, I've said this before. But maybe it changes
things too drastically.  Maybe there are aspects of gremlin that
actually require strong typing. I don't know. I haven't run into them.
On to the next item...

Kevin wrote:
>> Correct, these types weren't relevant... I only wanted to show you the 
>> format...
> However, I don't manage to understand the structure behind the format you 
> suggest, and I don't manage to establish a clear explicit representation in 
> my mind, regarding the example you provided in the TP-1274 PR. Could you 
> please give an example of how you would imagine the serialized JSON of :
> - an example list of typed values, like List
> - an example list of typed and untyped values, like a list with UUIDs and 
> booleans
> - an example map of typed and untyped values
>
> How would you define that format in a general way ? Like what I did when 
> saying
> "- untyped : value
> - typed : {"@type", "typeName", "value" : value}"
>
> Just trying your point better.
> Also what are the downsides you see with the format suggested above ?

The original format was in a list. I must have missed where you
accepted this format. In any case, like I originally stated, if you
want strong-typing, then _everything_ must be an _object_.

Here's an example of non-typed:
https://gist.github.com/robertdale/02931f5633be55a59c13bca3b0e58655
- native json only

Here's strongly typed:
https://gist.github.com/robertdale/6c074b165a72efee701e26f851f8b68a
- set (as an object), list (as an object), mixed-type lists, etc

Let me add that while there's no strict definition of schemaless, it
was not necessarily intended to include having mixed data types for a
single field. This is a really bad idea. Experts warn against this.
Most NoSQL databases don't even support this. You will probably die if
you use it. The default behavior for DSE graph, IBM graph, and even
Titan is to create the schema based on the first type inserted.  It
will complain if any subsequent type is different.

Also, schemaless doesn't mean without any schema. While not having to
define a schema up-front during a quickstart or early development
makes life easier, no one doing any serious work or going to
production goes without a schema.  Again, see DSE graph, IBM graph,
Titan, etc.

Let's take a look at DSE graph types [1]. They are a subset of
cassandra data types. What's really interesting about that is that
they are all represented in some simple form - string or integer
literals (and bool) - except for Geo but in even that can be in some
form of arrays. So blob, inet, uuid, even timestamp are all queried as
strings!

Also look at other APIs and you'll see the use of JSON without
strong-typing for non-domain and/or scalar types in IBM graph,
Elasticsearch, Solr, and just about every other REST API out there.
Types other than the weak-typing in JSON are settled by the backing
schema (southbound) or by the OGM (northbound).  Additionally,
VertexProperty returns only Object. I still have to know what the
underlying type is. What difference does it make if I cast
(strong-typed) or convert (weak-type)? I still have to do something in
order for it to be usable in java.  Maybe I'm just missing
something...

But at the end of the day, I would prefer consistency over whether
strong or weak typing.  :-)

Finally, I still would consider promoting spatial shapes to a
first-class entity in gremlin and include GeoJSON for serialization.
This is may be a separate effort.

1. 
https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/reference/refDSEGraphDataTypes.html

-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com



On 2016-07-09 16:48 (+0100), Stephen Mallette  wrote: 
> With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> important that we have a solid, efficient, programming language neutral,
> lossless serialization format. Right now that format is GraphSON and it
> works for that purpose (ever more  so with 2.0). Given some discussion on
> the GraphSON 2.0 PR driven a bit by Robert Dale:
> 
> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> 
> I wonder if we shouldn't consider another IO format that has Gremlin
> Server/GLVs in mind. At this point I'm not suggesting anything specific -
> I'm just hanging the idea out for further discussion and brain storming.
> Thoughts?
> 

Hey, so I'm trying to gather all infos we have here in order to prepare to move 
forward with the implem of GraphSON 2.0, here's what I come up with : 

Things we have : 
- Type format.
- The structure in Jackson to implement our own type format.
- All non native Graph types are typed (except the domain specific types).

New things we need : 
- Types for domain specific objects.
- Types for all numeric values.
- Don't serialize empty fields (outV and stuff).

Things we consider changing :
- Type IDs convention. Before : Java simple class names. Now : starts with a 
"domain" like "gremlin" followed by the "type name", which is a lowercased type 
name (like "uuid", or "float", or "vertex"). Example : "gremlin:uuid".
- Type format ?

Am I missing something ?

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread Marko Rodriguez

Hello,

> How would you define that format in a general way ? Like what I did when 
> saying 
> "- untyped : value
> - typed : {"@type", "typeName", "value" : value}"
> 
> Just trying your point better. 
> Also what are the downsides you see with the format suggested above ?

This makes sense to me.

Thus, Vertex becomes {@type=vertex, …}.

If you want to use JSON types, don’t {@type=} them, else, you can do 
{@type=int32}

Marko.

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com



On 2016-07-15 15:52 (+0100), 
"gallardo.kev...@gmail.com" wrote: 
> 
> 
> On 2016-07-15 14:44 (+0100), Robert Dale  wrote: 
> > It looks to me like a self-inflicted problem because the things that
> > are typed are already native to json so it's redundant.  And to go a
> > step further, I wouldn't consider the types to be 'correct' because
> > everything that is a HashMap is really a Vertex, Edge, or Property.
> > 
> > On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
> >  wrote:
> > >
> > >
> > > On 2016-07-13 13:17 (+0100), Robert Dale  wrote:
> > >> Marko, I agree that empty object properties should not be represented.
> > >> I think if you saw that in an example then it was probably for
> > >> demonstration purposes.
> > >>
> > >> Kevin, can you expand on this comment:
> > >>
> > >> > the format you suggest would lead to the same inconsistencies as in 
> > >> > GraphSON 1.0.
> > >> > Since the type is at the same level than the data itself, whether the 
> > >> > container is an Array or an Object
> > >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> > >>
> > >> What exactly are the inconsistencies?  What is the problem in
> > >> determining an array or object?
> > >> This is a natural JSON array (or list): []
> > >> This is a natural JSON object: {}
> > >>
> > >> Type at the object level is a common pattern and supported feature of
> > >> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> > >> 'type' at the object level. Titan supports GeoJSON currently.  I
> > >> wonder if it would make sense to promote geometry to gremlin.
> > >>
> > >
> > > I wasn't probably clear enough, in my first email exposing my motivation 
> > > to improve GraphSON 1.0, one of the things I noticed was that according 
> > > to the enclosing element (either an Array or a Map), a type will either 
> > > be described as (respectively) an element of the Array, or a key/value 
> > > pair in a Map, you can see that in the "embedded types" example of the 
> > > Tinkerpop docs : 
> > > http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer
> > >  .
> > >
> > > There you can see that the type "java.util.ArrayList" is a simple element 
> > > of the enclosing array, but the "java.util.HashMap" type is a field of 
> > > the enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not 
> > > seem consistent to me and even though I know that Jackson handles it 
> > > well, it seems that we'd better provide a consistent enclosing format 
> > > that we know is fixed whatever the enclosed data is, to make the 
> > > automatic type detection for other parsers in other libraries/languages 
> > > easier. Does that make sense ?
> > >
> > >> We should probably start documenting a table of supported types. (If
> > >> there is one, please provide link)
> > >>
> > >> I wonder if it even makes sense to type numbers according to their
> > >> memory model. As objects, Byte, Short, and Integer occupy the same
> > >> space. Long isn't much more.  So in Java we're not saving much space.
> > >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> > >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> > >> have this concept.  Does anything in gremlin actually require this?
> > >> I'm thinking that this is only going to be relevant at the domain
> > >> model level. This way json native numbers can be used and not need
> > >> typing.
> > >>
> > >> Additionally, I think that all things that will be typed should always
> > >> be typed. For the use cases of injesting a saved graph from a file, it
> > >> can probably be assumed that the top-level objects are vertices since
> > >> the graph is vertex-centric and everything else follows naturally.
> > >> I'm not entirely sure what is required for submitting traversals to
> > >> gremlin server from GLV.  However, if this is used for the results
> > >> from gremlin server then the results could start with any one of path,
> > >> vertex, edge, property, vertex property, etc. So you'll need that type
> > >> data there.
> > >>
> > >> --
> > >> Robert Dale
> > >>
> > >> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez  
> > >> wrote:
> > >> > Hi,
> > >> >
> > >> > I\u2019m not following this PR too closely so what I might be saying 
> > >> > is a already known/argued against/etc.
> > >> >
> > >> > 1. I think we should go with Robert Dale\u2019s proposal of 
> > >> > int32, int64, Vertex, uuid, etc. instead of Java class names.
> > >> > 2. In Java we then have a Map for typecasting 
> > >> > accordingly.
> > >> > 3. This would make GraphSON 2.0 perfect for Bytecode 
> > >> > serialization in TINKERPOP-1278.
> > >> > 4. I think that if a Vertex, Edge, etc. doesn\u2019t have 
> > >> > properties, outV, etc. then don\u2019t even have those fields in the 
> > >> > representation.
> > >> > 5. Most of the serialization back and forth will be 
> > >> > ReferenceXXX eleme

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com



On 2016-07-15 16:07 (+0100), 
"gallardo.kev...@gmail.com" wrote: 
> 
> 
> On 2016-07-15 15:52 (+0100), 
> "gallardo.kev...@gmail.com" wrote: 
> > 
> > 
> > On 2016-07-15 14:44 (+0100), Robert Dale  wrote: 
> > > It looks to me like a self-inflicted problem because the things that
> > > are typed are already native to json so it's redundant.  And to go a
> > > step further, I wouldn't consider the types to be 'correct' because
> > > everything that is a HashMap is really a Vertex, Edge, or Property.
> > > 
> > > On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
> > >  wrote:
> > > >
> > > >
> > > > On 2016-07-13 13:17 (+0100), Robert Dale  wrote:
> > > >> Marko, I agree that empty object properties should not be represented.
> > > >> I think if you saw that in an example then it was probably for
> > > >> demonstration purposes.
> > > >>
> > > >> Kevin, can you expand on this comment:
> > > >>
> > > >> > the format you suggest would lead to the same inconsistencies as in 
> > > >> > GraphSON 1.0.
> > > >> > Since the type is at the same level than the data itself, whether 
> > > >> > the container is an Array or an Object
> > > >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> > > >>
> > > >> What exactly are the inconsistencies?  What is the problem in
> > > >> determining an array or object?
> > > >> This is a natural JSON array (or list): []
> > > >> This is a natural JSON object: {}
> > > >>
> > > >> Type at the object level is a common pattern and supported feature of
> > > >> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> > > >> 'type' at the object level. Titan supports GeoJSON currently.  I
> > > >> wonder if it would make sense to promote geometry to gremlin.
> > > >>
> > > >
> > > > I wasn't probably clear enough, in my first email exposing my 
> > > > motivation to improve GraphSON 1.0, one of the things I noticed was 
> > > > that according to the enclosing element (either an Array or a Map), a 
> > > > type will either be described as (respectively) an element of the 
> > > > Array, or a key/value pair in a Map, you can see that in the "embedded 
> > > > types" example of the Tinkerpop docs : 
> > > > http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer
> > > >  .
> > > >
> > > > There you can see that the type "java.util.ArrayList" is a simple 
> > > > element of the enclosing array, but the "java.util.HashMap" type is a 
> > > > field of the enclosing Map as {"@class" : "java.util.HashMap", ...}. 
> > > > This does not seem consistent to me and even though I know that Jackson 
> > > > handles it well, it seems that we'd better provide a consistent 
> > > > enclosing format that we know is fixed whatever the enclosed data is, 
> > > > to make the automatic type detection for other parsers in other 
> > > > libraries/languages easier. Does that make sense ?
> > > >
> > > >> We should probably start documenting a table of supported types. (If
> > > >> there is one, please provide link)
> > > >>
> > > >> I wonder if it even makes sense to type numbers according to their
> > > >> memory model. As objects, Byte, Short, and Integer occupy the same
> > > >> space. Long isn't much more.  So in Java we're not saving much space.
> > > >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> > > >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> > > >> have this concept.  Does anything in gremlin actually require this?
> > > >> I'm thinking that this is only going to be relevant at the domain
> > > >> model level. This way json native numbers can be used and not need
> > > >> typing.
> > > >>
> > > >> Additionally, I think that all things that will be typed should always
> > > >> be typed. For the use cases of injesting a saved graph from a file, it
> > > >> can probably be assumed that the top-level objects are vertices since
> > > >> the graph is vertex-centric and everything else follows naturally.
> > > >> I'm not entirely sure what is required for submitting traversals to
> > > >> gremlin server from GLV.  However, if this is used for the results
> > > >> from gremlin server then the results could start with any one of path,
> > > >> vertex, edge, property, vertex property, etc. So you'll need that type
> > > >> data there.
> > > >>
> > > >> --
> > > >> Robert Dale
> > > >>
> > > >> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez 
> > > >>  wrote:
> > > >> > Hi,
> > > >> >
> > > >> > I\u2019m not following this PR too closely so what I might be saying 
> > > >> > is a already known/argued against/etc.
> > > >> >
> > > >> > 1. I think we should go with Robert Dale\u2019s proposal of 
> > > >> > int32, int64, Vertex, uuid, etc. instead of Java class names.
> > > >> > 2. In Java we then have a Map for typecasting 
> > > >> > accordingly.
> > > >> > 3. This would make GraphSON 2.0 perfect for Bytecode 
> > > >> > serialization in TINKERPOP-1278.
> > > >> >

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread gallardo.kev...@gmail.com



On 2016-07-15 14:44 (+0100), Robert Dale  wrote: 
> It looks to me like a self-inflicted problem because the things that
> are typed are already native to json so it's redundant.  And to go a
> step further, I wouldn't consider the types to be 'correct' because
> everything that is a HashMap is really a Vertex, Edge, or Property.
> 
> On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
>  wrote:
> >
> >
> > On 2016-07-13 13:17 (+0100), Robert Dale  wrote:
> >> Marko, I agree that empty object properties should not be represented.
> >> I think if you saw that in an example then it was probably for
> >> demonstration purposes.
> >>
> >> Kevin, can you expand on this comment:
> >>
> >> > the format you suggest would lead to the same inconsistencies as in 
> >> > GraphSON 1.0.
> >> > Since the type is at the same level than the data itself, whether the 
> >> > container is an Array or an Object
> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> >>
> >> What exactly are the inconsistencies?  What is the problem in
> >> determining an array or object?
> >> This is a natural JSON array (or list): []
> >> This is a natural JSON object: {}
> >>
> >> Type at the object level is a common pattern and supported feature of
> >> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> >> 'type' at the object level. Titan supports GeoJSON currently.  I
> >> wonder if it would make sense to promote geometry to gremlin.
> >>
> >
> > I wasn't probably clear enough, in my first email exposing my motivation to 
> > improve GraphSON 1.0, one of the things I noticed was that according to the 
> > enclosing element (either an Array or a Map), a type will either be 
> > described as (respectively) an element of the Array, or a key/value pair in 
> > a Map, you can see that in the "embedded types" example of the Tinkerpop 
> > docs : 
> > http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer .
> >
> > There you can see that the type "java.util.ArrayList" is a simple element 
> > of the enclosing array, but the "java.util.HashMap" type is a field of the 
> > enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not seem 
> > consistent to me and even though I know that Jackson handles it well, it 
> > seems that we'd better provide a consistent enclosing format that we know 
> > is fixed whatever the enclosed data is, to make the automatic type 
> > detection for other parsers in other libraries/languages easier. Does that 
> > make sense ?
> >
> >> We should probably start documenting a table of supported types. (If
> >> there is one, please provide link)
> >>
> >> I wonder if it even makes sense to type numbers according to their
> >> memory model. As objects, Byte, Short, and Integer occupy the same
> >> space. Long isn't much more.  So in Java we're not saving much space.
> >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >> have this concept.  Does anything in gremlin actually require this?
> >> I'm thinking that this is only going to be relevant at the domain
> >> model level. This way json native numbers can be used and not need
> >> typing.
> >>
> >> Additionally, I think that all things that will be typed should always
> >> be typed. For the use cases of injesting a saved graph from a file, it
> >> can probably be assumed that the top-level objects are vertices since
> >> the graph is vertex-centric and everything else follows naturally.
> >> I'm not entirely sure what is required for submitting traversals to
> >> gremlin server from GLV.  However, if this is used for the results
> >> from gremlin server then the results could start with any one of path,
> >> vertex, edge, property, vertex property, etc. So you'll need that type
> >> data there.
> >>
> >> --
> >> Robert Dale
> >>
> >> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez  
> >> wrote:
> >> > Hi,
> >> >
> >> > Iâm not following this PR too closely so what I might be saying is a 
> >> > already known/argued against/etc.
> >> >
> >> > 1. I think we should go with Robert Daleâs proposal of int32, 
> >> > int64, Vertex, uuid, etc. instead of Java class names.
> >> > 2. In Java we then have a Map for typecasting 
> >> > accordingly.
> >> > 3. This would make GraphSON 2.0 perfect for Bytecode 
> >> > serialization in TINKERPOP-1278.
> >> > 4. I think that if a Vertex, Edge, etc. doesnât have 
> >> > properties, outV, etc. then donât even have those fields in the 
> >> > representation.
> >> > 5. Most of the serialization back and forth will be ReferenceXXX 
> >> > elements and thus, donât create more Maps/lists for no reason. â 
> >> > less chars.
> >> >
> >> > For me, my interests with this work is all about a language agnostic way 
> >> > of sending Gremlin traversal bytecode between different languages. This 
> >> > work is exactly what I

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-15 Thread Robert Dale

It looks to me like a self-inflicted problem because the things that
are typed are already native to json so it's redundant.  And to go a
step further, I wouldn't consider the types to be 'correct' because
everything that is a HashMap is really a Vertex, Edge, or Property.

On Thu, Jul 14, 2016 at 10:03 AM, gallardo.kev...@gmail.com
 wrote:
>
>
> On 2016-07-13 13:17 (+0100), Robert Dale  wrote:
>> Marko, I agree that empty object properties should not be represented.
>> I think if you saw that in an example then it was probably for
>> demonstration purposes.
>>
>> Kevin, can you expand on this comment:
>>
>> > the format you suggest would lead to the same inconsistencies as in 
>> > GraphSON 1.0.
>> > Since the type is at the same level than the data itself, whether the 
>> > container is an Array or an Object
>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
>>
>> What exactly are the inconsistencies?  What is the problem in
>> determining an array or object?
>> This is a natural JSON array (or list): []
>> This is a natural JSON object: {}
>>
>> Type at the object level is a common pattern and supported feature of
>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
>> 'type' at the object level. Titan supports GeoJSON currently.  I
>> wonder if it would make sense to promote geometry to gremlin.
>>
>
> I wasn't probably clear enough, in my first email exposing my motivation to 
> improve GraphSON 1.0, one of the things I noticed was that according to the 
> enclosing element (either an Array or a Map), a type will either be described 
> as (respectively) an element of the Array, or a key/value pair in a Map, you 
> can see that in the "embedded types" example of the Tinkerpop docs : 
> http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer .
>
> There you can see that the type "java.util.ArrayList" is a simple element of 
> the enclosing array, but the "java.util.HashMap" type is a field of the 
> enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not seem 
> consistent to me and even though I know that Jackson handles it well, it 
> seems that we'd better provide a consistent enclosing format that we know is 
> fixed whatever the enclosed data is, to make the automatic type detection for 
> other parsers in other libraries/languages easier. Does that make sense ?
>
>> We should probably start documenting a table of supported types. (If
>> there is one, please provide link)
>>
>> I wonder if it even makes sense to type numbers according to their
>> memory model. As objects, Byte, Short, and Integer occupy the same
>> space. Long isn't much more.  So in Java we're not saving much space.
>> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
>> have this concept.  Does anything in gremlin actually require this?
>> I'm thinking that this is only going to be relevant at the domain
>> model level. This way json native numbers can be used and not need
>> typing.
>>
>> Additionally, I think that all things that will be typed should always
>> be typed. For the use cases of injesting a saved graph from a file, it
>> can probably be assumed that the top-level objects are vertices since
>> the graph is vertex-centric and everything else follows naturally.
>> I'm not entirely sure what is required for submitting traversals to
>> gremlin server from GLV.  However, if this is used for the results
>> from gremlin server then the results could start with any one of path,
>> vertex, edge, property, vertex property, etc. So you'll need that type
>> data there.
>>
>> --
>> Robert Dale
>>
>> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez  
>> wrote:
>> > Hi,
>> >
>> > I’m not following this PR too closely so what I might be saying is a 
>> > already known/argued against/etc.
>> >
>> > 1. I think we should go with Robert Dale’s proposal of int32, 
>> > int64, Vertex, uuid, etc. instead of Java class names.
>> > 2. In Java we then have a Map for typecasting 
>> > accordingly.
>> > 3. This would make GraphSON 2.0 perfect for Bytecode serialization 
>> > in TINKERPOP-1278.
>> > 4. I think that if a Vertex, Edge, etc. doesn’t have properties, 
>> > outV, etc. then don’t even have those fields in the representation.
>> > 5. Most of the serialization back and forth will be ReferenceXXX 
>> > elements and thus, don’t create more Maps/lists for no reason. — less 
>> > chars.
>> >
>> > For me, my interests with this work is all about a language agnostic way 
>> > of sending Gremlin traversal bytecode between different languages. This 
>> > work is exactly what I am looking for.
>> >
>> > Thanks,
>> > Marko.
>> >
>> > http://markorodriguez.com
>> >
>> >
>> >
>> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette  wrote:
>> >>
>> >> With all the work on GLVs and the recent work on GraphSON 2.0, I think 
>> >> it's
>> >> important that w

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-14 Thread gallardo.kev...@gmail.com



On 2016-07-13 13:17 (+0100), Robert Dale  wrote: 
> Marko, I agree that empty object properties should not be represented.
> I think if you saw that in an example then it was probably for
> demonstration purposes.
> 
> Kevin, can you expand on this comment:
> 
> > the format you suggest would lead to the same inconsistencies as in 
> > GraphSON 1.0.
> > Since the type is at the same level than the data itself, whether the 
> > container is an Array or an Object
> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> 
> What exactly are the inconsistencies?  What is the problem in
> determining an array or object?
> This is a natural JSON array (or list): []
> This is a natural JSON object: {}
> 
> Type at the object level is a common pattern and supported feature of
> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> 'type' at the object level. Titan supports GeoJSON currently.  I
> wonder if it would make sense to promote geometry to gremlin.
> 

I wasn't probably clear enough, in my first email exposing my motivation to 
improve GraphSON 1.0, one of the things I noticed was that according to the 
enclosing element (either an Array or a Map), a type will either be described 
as (respectively) an element of the Array, or a key/value pair in a Map, you 
can see that in the "embedded types" example of the Tinkerpop docs : 
http://tinkerpop.apache.org/docs/current/reference/#graphson-reader-writer . 

There you can see that the type "java.util.ArrayList" is a simple element of 
the enclosing array, but the "java.util.HashMap" type is a field of the 
enclosing Map as {"@class" : "java.util.HashMap", ...}. This does not seem 
consistent to me and even though I know that Jackson handles it well, it seems 
that we'd better provide a consistent enclosing format that we know is fixed 
whatever the enclosed data is, to make the automatic type detection for other 
parsers in other libraries/languages easier. Does that make sense ?

> We should probably start documenting a table of supported types. (If
> there is one, please provide link)
> 
> I wonder if it even makes sense to type numbers according to their
> memory model. As objects, Byte, Short, and Integer occupy the same
> space. Long isn't much more.  So in Java we're not saving much space.
> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> have this concept.  Does anything in gremlin actually require this?
> I'm thinking that this is only going to be relevant at the domain
> model level. This way json native numbers can be used and not need
> typing.
> 
> Additionally, I think that all things that will be typed should always
> be typed. For the use cases of injesting a saved graph from a file, it
> can probably be assumed that the top-level objects are vertices since
> the graph is vertex-centric and everything else follows naturally.
> I'm not entirely sure what is required for submitting traversals to
> gremlin server from GLV.  However, if this is used for the results
> from gremlin server then the results could start with any one of path,
> vertex, edge, property, vertex property, etc. So you'll need that type
> data there.
> 
> -- 
> Robert Dale
> 
> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez  wrote:
> > Hi,
> >
> > Iâm not following this PR too closely so what I might be saying is a 
> > already known/argued against/etc.
> >
> > 1. I think we should go with Robert Daleâs proposal of int32, 
> > int64, Vertex, uuid, etc. instead of Java class names.
> > 2. In Java we then have a Map for typecasting 
> > accordingly.
> > 3. This would make GraphSON 2.0 perfect for Bytecode serialization 
> > in TINKERPOP-1278.
> > 4. I think that if a Vertex, Edge, etc. doesnât have properties, 
> > outV, etc. then donât even have those fields in the representation.
> > 5. Most of the serialization back and forth will be ReferenceXXX 
> > elements and thus, donât create more Maps/lists for no reason. â less 
> > chars.
> >
> > For me, my interests with this work is all about a language agnostic way of 
> > sending Gremlin traversal bytecode between different languages. This work 
> > is exactly what I am looking for.
> >
> > Thanks,
> > Marko.
> >
> > http://markorodriguez.com
> >
> >
> >
> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette  wrote:
> >>
> >> With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> >> important that we have a solid, efficient, programming language neutral,
> >> lossless serialization format. Right now that format is GraphSON and it
> >> works for that purpose (ever more  so with 2.0). Given some discussion on
> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >>
> >> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >>
> >> I wonder if we shouldn't consider another IO format that has Gremlin
> >> Server/GLVs in

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Marko Rodriguez

Hi,

In TINKERPOP-1278, I registered GraphSON Serializers and Deserializers for 
Bytecode, P, and Enums.


https://github.com/apache/tinkerpop/blob/0319b3d951251ad47176ade3f19fbfdda250/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONTraversalSerializers.java
 


https://github.com/apache/tinkerpop/blob/0319b3d951251ad47176ade3f19fbfdda250/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONModule.java#L126-L142
 


Now you can do this:

https://gist.github.com/okram/1e9b407670b51e5f5fb6b85b2b9a6caa 


TADA!,
Marko.

http://markorodriguez.com


SIDENOTE: This serves as a foundation for when we move to GraphSON 2.0. In 
terms of numbers, I think, unfortunately, we have to stick with int32, int64, 
float, double, etc. given graph database providers and their type systems. Its 
not about the Gremlin traversal API, its more about provider schemas. 
has(“someNumber”,12L) vs. has(“someNumber”,12).






> On Jul 13, 2016, at 3:29 PM, Robert Dale  wrote:
> 
> If we go by the gremlin APIs:
> 
> From a client.submit(), Result [1] is only obligated to types Vertex,
> Edge, Path, Property, Boolean, Object, and "Numbers"
> (byte,short,int,long - I'll call these convenience for long;
> double,float - I'll call this convenience for double).
> 
> Using the native java DSL, looks like Traversal.next() [2] would
> return Vertex, Edge, Property, Map, Object, (extends) Number.
> Probably Boolean. Maybe a VertexProperty?  Possibly others but hard to
> tell since it's all generics.  Please correct me where I'm wrong.
> 
> In other words, IMHO the GLV really has no obligation to preserve or
> maintain any other specific types. (Don't get me wrong, it's very
> convenient, but not required.)  This is analogous to other types of
> drivers. For instance, JDBC has no idea what java type you actually
> want. It does know how it's stored in the database but otherwise it
> has convenience methods for numbers and other things. While it's
> common to map objects to tables 1:1, it's really up to the caller to
> be aware of and call for the expected type.
> 
> I think it's the responsibility of the graph-database driver to be
> able to convert types appropriately to the underlying system.  And we
> do see this behavior with existing graph implementations. Take for
> example UUID.  I don't specify the type to gremlin script. It's a
> string. The graph-driver knows to convert that to a UUID if it's
> schema is configured as such.  If there wasn't a schema, that's fine,
> it will just be stored as a string. And someone who doesn't set a
> schema, obviously doesn't care how it's stored.
> 
> For automatic and strong type conversion, one would use a Object Graph
> Mapper (like an ORM, e.g. Hibernate) at a layer above the GLV.  This
> thing would introspect objects and see that, hey, it wants a Short
> instead of the default long, or it wants a UUID instead of a String, I
> should convert those things because I'm so handy!
> 
> So getting out of the type conversion game makes your life a little
> easier. Maybe it puts more pressure on graph providers to do
> conversion but also to potentially provide GraphSON codecs for any
> non-gremlin-supported types.
> 
> I don't think I have anything more to say on the subject. To be
> honest, I have no skin in the game. I don't see myself directly
> consuming this. Ultimately you guys need to decide what works for you
> and your use cases.
> 
> 1. 
> http://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/driver/Result.html
> 2. 
> http://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/process/traversal/Traversal.html
> 
> 
> On Wed, Jul 13, 2016 at 3:56 PM, Jason Plurad  wrote:
>> Unipop uses String ids. Sqlg uses Long ids.
>> 
>> Seems fair enough that we can compare ids as numeric by checking the
>> graph.features() for supportsNumericIds(). One complication would be graphs
>> that allow multiple id types.
>> 
>> 
>> On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette 
>> wrote:
>> 
 First, is there a wiki that we can keep updated with decisions or at
>>> least
>>> decision points? I know there's an old wiki, but is there/will there be a
>>> new wiki?
>>> 
>>> No - we don't have a wiki. Design decisions tend to get trapped in the
>>> mailing list (or JIRA) which isn't so good. Maybe that's a separate
>>> discussion.
>>> 
 Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>>> properties.
>>>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Stephen Mallette

i'll answer your second email first about GraphSON because it's shorter and
i know the answer without too much thought (i'll need to take some time to
think on the other).

So the answer to "is this true?" is yes and no. The "yes" part is related
to the fact that I believe that by default writeGraph() will generate the
"array of vertices" that document is referring to. The reason for this is
that the file generated needs to be arbitrarily splittable for processing
in hadoop/spark, so individual lines of valid JSON are used to accomplish
that. The "no" part is that if you want valid JSON you can get it by
configuring the GraphSONWriter to wrapAdjacencyList(true).  I don't think
that will change for GraphSON 2.0 unless there is an idea for dealing with
the hadoop/spark issue.

On Wed, Jul 13, 2016 at 5:35 PM, Robert Dale  wrote:

> On a different subject, I read on IBM's site that GraphSON 1.0
> documents "are not valid JSON documents" [1].  Is this true? I looked
> at one example and it did indeed look that way. It was an array of
> vertices but without the array notation and not separated by ","
>
> Was there a reason for this?  Please tell me GraphSON 2.0 will be valid
> JSON!
>
> 1. https://ibm-graph-docs.ng.bluemix.net/api.html#bulk-input-apis
>
> On Wed, Jul 13, 2016 at 3:56 PM, Jason Plurad  wrote:
> > Unipop uses String ids. Sqlg uses Long ids.
> >
> > Seems fair enough that we can compare ids as numeric by checking the
> > graph.features() for supportsNumericIds(). One complication would be
> graphs
> > that allow multiple id types.
> >
> >
> > On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette 
> > wrote:
> >
> >> > First, is there a wiki that we can keep updated with decisions or at
> >> least
> >> decision points? I know there's an old wiki, but is there/will there be
> a
> >> new wiki?
> >>
> >> No - we don't have a wiki. Design decisions tend to get trapped in the
> >> mailing list (or JIRA) which isn't so good. Maybe that's a separate
> >> discussion.
> >>
> >> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> >> properties.
> >> It treats all types, primitive or object, from byte to long, double,
> float
> >> as numbers.
> >>
> >> Perhaps we could take a stronger stance on this in the test cases? Does
> >> anyone know what graphs this would impact besides Titan and TinkerGraph
> (I
> >> suspect DSE Graph, but not 100% sure)?
> >>
> >>
> >>
> >> On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale  wrote:
> >>
> >> > First, is there a wiki that we can keep updated with decisions or at
> >> > least decision points? I know there's an old wiki, but is there/will
> >> > there be a new wiki?
> >> >
> >> > Stephen, IMO, that's still bad behavior. That says to me a number is
> >> > not a number.  But, yes, schemaless does allow one to put crap in and
> >> > get crap out. So designers should be aware of these types of pitfalls.
> >> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> >> > properties. It treats all types, primitive or object, from byte to
> >> > long, double, float as numbers.  This is pretty standard behavior in
> >> > SQL, JDBC drivers, and other NoSQL technologies.
> >> >
> >> >
> >> >
> >> > On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette <
> spmalle...@gmail.com
> >> >
> >> > wrote:
> >> > > Marko, the namespacing idea seems smart.
> >> > >
> >> > > Robert, I think other graphs have similar behavior to TinkerGraph's
> >> > > default. In Titan, the absence of a schema (default, obviously)
> >> produces
> >> > > this:
> >> > >
> >> > > gremlin> graph =
> >> TitanFactory.open('conf/titan-cassandra-es.properties')
> >> > > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> >> > > gremlin> graph.addVertex("n",100D)
> >> > > ==>v[4288]
> >> > > gremlin> graph.traversal().V().has('n',100f)
> >> > > gremlin> graph.traversal().V().has('n',100d)
> >> > > ==>v[4288]
> >> > >
> >> > > This kind of problem has caused trouble for years and years in
> >> TinkerPop
> >> > > and allowing the type to be embedded seemed like a good solution. Of
> >> > > course, you bring up a good point about javascript - to this point
> >> we've
> >> > > relied on JS devs to conform to java/groovy types by forcing
> conversion
> >> > in
> >> > > their gremlin scripts or configuring their graphs to avoid use of
> types
> >> > > that would produce these kinds of ambiguous results.
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale 
> >> wrote:
> >> > >
> >> > >> And just to be clear, I'm not necessarily disagreeing. But I think
> >> > >> it's important to understand where and why it's necessary.
> >> > >>
> >> > >> For example, if I'm writing a gremlin script (string), I don't
> type my
> >> > >> input numbers.  It's rightly converted by the underlying
> architecture.
> >> > >> (I'm guessing groovy which has enhanced number support).  Also, if
> a
> >> > >> GLV is submitting typed numbers, how would that work? For example,
> in
> >> > >> Javascript?

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Robert Dale

On a different subject, I read on IBM's site that GraphSON 1.0
documents "are not valid JSON documents" [1].  Is this true? I looked
at one example and it did indeed look that way. It was an array of
vertices but without the array notation and not separated by ","

Was there a reason for this?  Please tell me GraphSON 2.0 will be valid JSON!

1. https://ibm-graph-docs.ng.bluemix.net/api.html#bulk-input-apis

On Wed, Jul 13, 2016 at 3:56 PM, Jason Plurad  wrote:
> Unipop uses String ids. Sqlg uses Long ids.
>
> Seems fair enough that we can compare ids as numeric by checking the
> graph.features() for supportsNumericIds(). One complication would be graphs
> that allow multiple id types.
>
>
> On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette 
> wrote:
>
>> > First, is there a wiki that we can keep updated with decisions or at
>> least
>> decision points? I know there's an old wiki, but is there/will there be a
>> new wiki?
>>
>> No - we don't have a wiki. Design decisions tend to get trapped in the
>> mailing list (or JIRA) which isn't so good. Maybe that's a separate
>> discussion.
>>
>> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>> properties.
>> It treats all types, primitive or object, from byte to long, double, float
>> as numbers.
>>
>> Perhaps we could take a stronger stance on this in the test cases? Does
>> anyone know what graphs this would impact besides Titan and TinkerGraph (I
>> suspect DSE Graph, but not 100% sure)?
>>
>>
>>
>> On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale  wrote:
>>
>> > First, is there a wiki that we can keep updated with decisions or at
>> > least decision points? I know there's an old wiki, but is there/will
>> > there be a new wiki?
>> >
>> > Stephen, IMO, that's still bad behavior. That says to me a number is
>> > not a number.  But, yes, schemaless does allow one to put crap in and
>> > get crap out. So designers should be aware of these types of pitfalls.
>> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>> > properties. It treats all types, primitive or object, from byte to
>> > long, double, float as numbers.  This is pretty standard behavior in
>> > SQL, JDBC drivers, and other NoSQL technologies.
>> >
>> >
>> >
>> > On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette > >
>> > wrote:
>> > > Marko, the namespacing idea seems smart.
>> > >
>> > > Robert, I think other graphs have similar behavior to TinkerGraph's
>> > > default. In Titan, the absence of a schema (default, obviously)
>> produces
>> > > this:
>> > >
>> > > gremlin> graph =
>> TitanFactory.open('conf/titan-cassandra-es.properties')
>> > > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
>> > > gremlin> graph.addVertex("n",100D)
>> > > ==>v[4288]
>> > > gremlin> graph.traversal().V().has('n',100f)
>> > > gremlin> graph.traversal().V().has('n',100d)
>> > > ==>v[4288]
>> > >
>> > > This kind of problem has caused trouble for years and years in
>> TinkerPop
>> > > and allowing the type to be embedded seemed like a good solution. Of
>> > > course, you bring up a good point about javascript - to this point
>> we've
>> > > relied on JS devs to conform to java/groovy types by forcing conversion
>> > in
>> > > their gremlin scripts or configuring their graphs to avoid use of types
>> > > that would produce these kinds of ambiguous results.
>> > >
>> > >
>> > >
>> > > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale 
>> wrote:
>> > >
>> > >> And just to be clear, I'm not necessarily disagreeing. But I think
>> > >> it's important to understand where and why it's necessary.
>> > >>
>> > >> For example, if I'm writing a gremlin script (string), I don't type my
>> > >> input numbers.  It's rightly converted by the underlying architecture.
>> > >> (I'm guessing groovy which has enhanced number support).  Also, if a
>> > >> GLV is submitting typed numbers, how would that work? For example, in
>> > >> Javascript?
>> > >>
>> > >> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale 
>> wrote:
>> > >> > Hi, Stephen.  I think that's a bad example. You may recall I brought
>> > >> > up that issue in the forum.  However, it's actually attributed to
>> the
>> > >> > default ID manager of ANY (for historical) which I think is a really
>> > >> > bad default (and reason) because it only leads to confusion.  Java
>> is
>> > >> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
>> > >> > 5.  In Java, number objects must be coerced into like form for
>> > >> > comparison. The other ID managers do this coercion.  Saner languages
>> > >> > do this under the covers.
>> > >> >
>> > >> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette <
>> > spmalle...@gmail.com>
>> > >> wrote:
>> > >> >> Robert, thanks for joining this discussion.
>> > >> >>
>> > >> >>> I wonder if it even makes sense to type numbers according to their
>> > >> >> memory model. As objects, Byte, Short, and Integer occupy the same
>> > >> >> space. Long isn't much more.  So in Java we're not saving much
>> spac

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Robert Dale

If we go by the gremlin APIs:

>From a client.submit(), Result [1] is only obligated to types Vertex,
Edge, Path, Property, Boolean, Object, and "Numbers"
(byte,short,int,long - I'll call these convenience for long;
double,float - I'll call this convenience for double).

Using the native java DSL, looks like Traversal.next() [2] would
return Vertex, Edge, Property, Map, Object, (extends) Number.
Probably Boolean. Maybe a VertexProperty?  Possibly others but hard to
tell since it's all generics.  Please correct me where I'm wrong.

In other words, IMHO the GLV really has no obligation to preserve or
maintain any other specific types. (Don't get me wrong, it's very
convenient, but not required.)  This is analogous to other types of
drivers. For instance, JDBC has no idea what java type you actually
want. It does know how it's stored in the database but otherwise it
has convenience methods for numbers and other things. While it's
common to map objects to tables 1:1, it's really up to the caller to
be aware of and call for the expected type.

I think it's the responsibility of the graph-database driver to be
able to convert types appropriately to the underlying system.  And we
do see this behavior with existing graph implementations. Take for
example UUID.  I don't specify the type to gremlin script. It's a
string. The graph-driver knows to convert that to a UUID if it's
schema is configured as such.  If there wasn't a schema, that's fine,
it will just be stored as a string. And someone who doesn't set a
schema, obviously doesn't care how it's stored.

For automatic and strong type conversion, one would use a Object Graph
Mapper (like an ORM, e.g. Hibernate) at a layer above the GLV.  This
thing would introspect objects and see that, hey, it wants a Short
instead of the default long, or it wants a UUID instead of a String, I
should convert those things because I'm so handy!

So getting out of the type conversion game makes your life a little
easier. Maybe it puts more pressure on graph providers to do
conversion but also to potentially provide GraphSON codecs for any
non-gremlin-supported types.

I don't think I have anything more to say on the subject. To be
honest, I have no skin in the game. I don't see myself directly
consuming this. Ultimately you guys need to decide what works for you
and your use cases.

1. 
http://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/driver/Result.html
2. 
http://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/process/traversal/Traversal.html

On Wed, Jul 13, 2016 at 3:56 PM, Jason Plurad  wrote:
> Unipop uses String ids. Sqlg uses Long ids.
>
> Seems fair enough that we can compare ids as numeric by checking the
> graph.features() for supportsNumericIds(). One complication would be graphs
> that allow multiple id types.
>
>
> On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette 
> wrote:
>
>> > First, is there a wiki that we can keep updated with decisions or at
>> least
>> decision points? I know there's an old wiki, but is there/will there be a
>> new wiki?
>>
>> No - we don't have a wiki. Design decisions tend to get trapped in the
>> mailing list (or JIRA) which isn't so good. Maybe that's a separate
>> discussion.
>>
>> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>> properties.
>> It treats all types, primitive or object, from byte to long, double, float
>> as numbers.
>>
>> Perhaps we could take a stronger stance on this in the test cases? Does
>> anyone know what graphs this would impact besides Titan and TinkerGraph (I
>> suspect DSE Graph, but not 100% sure)?
>>
>>
>>
>> On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale  wrote:
>>
>> > First, is there a wiki that we can keep updated with decisions or at
>> > least decision points? I know there's an old wiki, but is there/will
>> > there be a new wiki?
>> >
>> > Stephen, IMO, that's still bad behavior. That says to me a number is
>> > not a number.  But, yes, schemaless does allow one to put crap in and
>> > get crap out. So designers should be aware of these types of pitfalls.
>> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>> > properties. It treats all types, primitive or object, from byte to
>> > long, double, float as numbers.  This is pretty standard behavior in
>> > SQL, JDBC drivers, and other NoSQL technologies.
>> >
>> >
>> >
>> > On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette > >
>> > wrote:
>> > > Marko, the namespacing idea seems smart.
>> > >
>> > > Robert, I think other graphs have similar behavior to TinkerGraph's
>> > > default. In Titan, the absence of a schema (default, obviously)
>> produces
>> > > this:
>> > >
>> > > gremlin> graph =
>> TitanFactory.open('conf/titan-cassandra-es.properties')
>> > > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
>> > > gremlin> graph.addVertex("n",100D)
>> > > ==>v[4288]
>> > > gremlin> graph.traversal().V().has('n',100f)
>> > > gremlin> graph.traversal().V().has('

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Jason Plurad

Unipop uses String ids. Sqlg uses Long ids.

Seems fair enough that we can compare ids as numeric by checking the
graph.features() for supportsNumericIds(). One complication would be graphs
that allow multiple id types.


On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette 
wrote:

> > First, is there a wiki that we can keep updated with decisions or at
> least
> decision points? I know there's an old wiki, but is there/will there be a
> new wiki?
>
> No - we don't have a wiki. Design decisions tend to get trapped in the
> mailing list (or JIRA) which isn't so good. Maybe that's a separate
> discussion.
>
> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> properties.
> It treats all types, primitive or object, from byte to long, double, float
> as numbers.
>
> Perhaps we could take a stronger stance on this in the test cases? Does
> anyone know what graphs this would impact besides Titan and TinkerGraph (I
> suspect DSE Graph, but not 100% sure)?
>
>
>
> On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale  wrote:
>
> > First, is there a wiki that we can keep updated with decisions or at
> > least decision points? I know there's an old wiki, but is there/will
> > there be a new wiki?
> >
> > Stephen, IMO, that's still bad behavior. That says to me a number is
> > not a number.  But, yes, schemaless does allow one to put crap in and
> > get crap out. So designers should be aware of these types of pitfalls.
> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> > properties. It treats all types, primitive or object, from byte to
> > long, double, float as numbers.  This is pretty standard behavior in
> > SQL, JDBC drivers, and other NoSQL technologies.
> >
> >
> >
> > On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette  >
> > wrote:
> > > Marko, the namespacing idea seems smart.
> > >
> > > Robert, I think other graphs have similar behavior to TinkerGraph's
> > > default. In Titan, the absence of a schema (default, obviously)
> produces
> > > this:
> > >
> > > gremlin> graph =
> TitanFactory.open('conf/titan-cassandra-es.properties')
> > > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> > > gremlin> graph.addVertex("n",100D)
> > > ==>v[4288]
> > > gremlin> graph.traversal().V().has('n',100f)
> > > gremlin> graph.traversal().V().has('n',100d)
> > > ==>v[4288]
> > >
> > > This kind of problem has caused trouble for years and years in
> TinkerPop
> > > and allowing the type to be embedded seemed like a good solution. Of
> > > course, you bring up a good point about javascript - to this point
> we've
> > > relied on JS devs to conform to java/groovy types by forcing conversion
> > in
> > > their gremlin scripts or configuring their graphs to avoid use of types
> > > that would produce these kinds of ambiguous results.
> > >
> > >
> > >
> > > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale 
> wrote:
> > >
> > >> And just to be clear, I'm not necessarily disagreeing. But I think
> > >> it's important to understand where and why it's necessary.
> > >>
> > >> For example, if I'm writing a gremlin script (string), I don't type my
> > >> input numbers.  It's rightly converted by the underlying architecture.
> > >> (I'm guessing groovy which has enhanced number support).  Also, if a
> > >> GLV is submitting typed numbers, how would that work? For example, in
> > >> Javascript?
> > >>
> > >> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale 
> wrote:
> > >> > Hi, Stephen.  I think that's a bad example. You may recall I brought
> > >> > up that issue in the forum.  However, it's actually attributed to
> the
> > >> > default ID manager of ANY (for historical) which I think is a really
> > >> > bad default (and reason) because it only leads to confusion.  Java
> is
> > >> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
> > >> > 5.  In Java, number objects must be coerced into like form for
> > >> > comparison. The other ID managers do this coercion.  Saner languages
> > >> > do this under the covers.
> > >> >
> > >> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette <
> > spmalle...@gmail.com>
> > >> wrote:
> > >> >> Robert, thanks for joining this discussion.
> > >> >>
> > >> >>> I wonder if it even makes sense to type numbers according to their
> > >> >> memory model. As objects, Byte, Short, and Integer occupy the same
> > >> >> space. Long isn't much more.  So in Java we're not saving much
> space.
> > >> >> Jackson will attempt to parse in order: int, long, BigInt,
> > BigDecimal.
> > >> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't
> even
> > >> >> have this concept.  Does anything in gremlin actually require this?
> > >> >>
> > >> >> If the intended numeric type isn't preserved, weird things can
> happen
> > >> with
> > >> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph using
> > the
> > >> >> default ID manager will not be happy if you try to do a lookup of
> > Long
> > >> >> identifiers with an Integer:
> > >> >>
> > >> >> gre

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Stephen Mallette

> First, is there a wiki that we can keep updated with decisions or at least
decision points? I know there's an old wiki, but is there/will there be a
new wiki?

No - we don't have a wiki. Design decisions tend to get trapped in the
mailing list (or JIRA) which isn't so good. Maybe that's a separate
discussion.

> Neo4j via NeoGraph appears to do the right thing for vertex IDs and 
> properties.
It treats all types, primitive or object, from byte to long, double, float
as numbers.

Perhaps we could take a stronger stance on this in the test cases? Does
anyone know what graphs this would impact besides Titan and TinkerGraph (I
suspect DSE Graph, but not 100% sure)?



On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale  wrote:

> First, is there a wiki that we can keep updated with decisions or at
> least decision points? I know there's an old wiki, but is there/will
> there be a new wiki?
>
> Stephen, IMO, that's still bad behavior. That says to me a number is
> not a number.  But, yes, schemaless does allow one to put crap in and
> get crap out. So designers should be aware of these types of pitfalls.
> Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> properties. It treats all types, primitive or object, from byte to
> long, double, float as numbers.  This is pretty standard behavior in
> SQL, JDBC drivers, and other NoSQL technologies.
>
>
>
> On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette 
> wrote:
> > Marko, the namespacing idea seems smart.
> >
> > Robert, I think other graphs have similar behavior to TinkerGraph's
> > default. In Titan, the absence of a schema (default, obviously) produces
> > this:
> >
> > gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
> > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> > gremlin> graph.addVertex("n",100D)
> > ==>v[4288]
> > gremlin> graph.traversal().V().has('n',100f)
> > gremlin> graph.traversal().V().has('n',100d)
> > ==>v[4288]
> >
> > This kind of problem has caused trouble for years and years in TinkerPop
> > and allowing the type to be embedded seemed like a good solution. Of
> > course, you bring up a good point about javascript - to this point we've
> > relied on JS devs to conform to java/groovy types by forcing conversion
> in
> > their gremlin scripts or configuring their graphs to avoid use of types
> > that would produce these kinds of ambiguous results.
> >
> >
> >
> > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale  wrote:
> >
> >> And just to be clear, I'm not necessarily disagreeing. But I think
> >> it's important to understand where and why it's necessary.
> >>
> >> For example, if I'm writing a gremlin script (string), I don't type my
> >> input numbers.  It's rightly converted by the underlying architecture.
> >> (I'm guessing groovy which has enhanced number support).  Also, if a
> >> GLV is submitting typed numbers, how would that work? For example, in
> >> Javascript?
> >>
> >> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale  wrote:
> >> > Hi, Stephen.  I think that's a bad example. You may recall I brought
> >> > up that issue in the forum.  However, it's actually attributed to the
> >> > default ID manager of ANY (for historical) which I think is a really
> >> > bad default (and reason) because it only leads to confusion.  Java is
> >> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
> >> > 5.  In Java, number objects must be coerced into like form for
> >> > comparison. The other ID managers do this coercion.  Saner languages
> >> > do this under the covers.
> >> >
> >> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette <
> spmalle...@gmail.com>
> >> wrote:
> >> >> Robert, thanks for joining this discussion.
> >> >>
> >> >>> I wonder if it even makes sense to type numbers according to their
> >> >> memory model. As objects, Byte, Short, and Integer occupy the same
> >> >> space. Long isn't much more.  So in Java we're not saving much space.
> >> >> Jackson will attempt to parse in order: int, long, BigInt,
> BigDecimal.
> >> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >> >> have this concept.  Does anything in gremlin actually require this?
> >> >>
> >> >> If the intended numeric type isn't preserved, weird things can happen
> >> with
> >> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph using
> the
> >> >> default ID manager will not be happy if you try to do a lookup of
> Long
> >> >> identifiers with an Integer:
> >> >>
> >> >> gremlin> graph = TinkerFactory.createModern()
> >> >> ==>tinkergraph[vertices:6 edges:6]
> >> >> gremlin> graph.vertices(1)
> >> >> ==>v[1]
> >> >> gremlin> graph.vertices(1L)
> >> >> gremlin>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale 
> wrote:
> >> >>
> >> >>> Marko, I agree that empty object properties should not be
> represented.
> >> >>> I think if you saw that in an example then it was probably for
> >> >>> demonstration purposes.
> >> >>>
> >> >>> Kevin,

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Robert Dale

First, is there a wiki that we can keep updated with decisions or at
least decision points? I know there's an old wiki, but is there/will
there be a new wiki?

Stephen, IMO, that's still bad behavior. That says to me a number is
not a number.  But, yes, schemaless does allow one to put crap in and
get crap out. So designers should be aware of these types of pitfalls.
Neo4j via NeoGraph appears to do the right thing for vertex IDs and
properties. It treats all types, primitive or object, from byte to
long, double, float as numbers.  This is pretty standard behavior in
SQL, JDBC drivers, and other NoSQL technologies.



On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette  wrote:
> Marko, the namespacing idea seems smart.
>
> Robert, I think other graphs have similar behavior to TinkerGraph's
> default. In Titan, the absence of a schema (default, obviously) produces
> this:
>
> gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
> ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> gremlin> graph.addVertex("n",100D)
> ==>v[4288]
> gremlin> graph.traversal().V().has('n',100f)
> gremlin> graph.traversal().V().has('n',100d)
> ==>v[4288]
>
> This kind of problem has caused trouble for years and years in TinkerPop
> and allowing the type to be embedded seemed like a good solution. Of
> course, you bring up a good point about javascript - to this point we've
> relied on JS devs to conform to java/groovy types by forcing conversion in
> their gremlin scripts or configuring their graphs to avoid use of types
> that would produce these kinds of ambiguous results.
>
>
>
> On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale  wrote:
>
>> And just to be clear, I'm not necessarily disagreeing. But I think
>> it's important to understand where and why it's necessary.
>>
>> For example, if I'm writing a gremlin script (string), I don't type my
>> input numbers.  It's rightly converted by the underlying architecture.
>> (I'm guessing groovy which has enhanced number support).  Also, if a
>> GLV is submitting typed numbers, how would that work? For example, in
>> Javascript?
>>
>> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale  wrote:
>> > Hi, Stephen.  I think that's a bad example. You may recall I brought
>> > up that issue in the forum.  However, it's actually attributed to the
>> > default ID manager of ANY (for historical) which I think is a really
>> > bad default (and reason) because it only leads to confusion.  Java is
>> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
>> > 5.  In Java, number objects must be coerced into like form for
>> > comparison. The other ID managers do this coercion.  Saner languages
>> > do this under the covers.
>> >
>> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette 
>> wrote:
>> >> Robert, thanks for joining this discussion.
>> >>
>> >>> I wonder if it even makes sense to type numbers according to their
>> >> memory model. As objects, Byte, Short, and Integer occupy the same
>> >> space. Long isn't much more.  So in Java we're not saving much space.
>> >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
>> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
>> >> have this concept.  Does anything in gremlin actually require this?
>> >>
>> >> If the intended numeric type isn't preserved, weird things can happen
>> with
>> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph using the
>> >> default ID manager will not be happy if you try to do a lookup of Long
>> >> identifiers with an Integer:
>> >>
>> >> gremlin> graph = TinkerFactory.createModern()
>> >> ==>tinkergraph[vertices:6 edges:6]
>> >> gremlin> graph.vertices(1)
>> >> ==>v[1]
>> >> gremlin> graph.vertices(1L)
>> >> gremlin>
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale  wrote:
>> >>
>> >>> Marko, I agree that empty object properties should not be represented.
>> >>> I think if you saw that in an example then it was probably for
>> >>> demonstration purposes.
>> >>>
>> >>> Kevin, can you expand on this comment:
>> >>>
>> >>> > the format you suggest would lead to the same inconsistencies as in
>> >>> GraphSON 1.0.
>> >>> > Since the type is at the same level than the data itself, whether the
>> >>> container is an Array or an Object
>> >>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
>> >>>
>> >>> What exactly are the inconsistencies?  What is the problem in
>> >>> determining an array or object?
>> >>> This is a natural JSON array (or list): []
>> >>> This is a natural JSON object: {}
>> >>>
>> >>> Type at the object level is a common pattern and supported feature of
>> >>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
>> >>> 'type' at the object level. Titan supports GeoJSON currently.  I
>> >>> wonder if it would make sense to promote geometry to gremlin.
>> >>>
>> >>> We should probably start documenting a table of supported types. (If
>> >>> there is one, please pr

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Marko Rodriguez

Hi,

Here is a toy class I just made that converts Bytecode to “GraphSON2.0.” I 
believe it covers everything! (save I didn’t blow out the Number section):

https://gist.github.com/okram/908b73b24e8db48f1006124942a900b1 


The following code:

Traversal traversal = 
  __.V().has("age", 
gt(10).and(lt(30))).out("knows").repeat(out().hasLabel("person")).times(2).groupCount().by(label);
GraphSONWriter.build().create().writeObject(System.out, 
GraphSONConverter.convert(traversal));

Outputs:

{"bytecode":[["V"],["has","age",{"predicate":"and","@type":"P","value":[{"predicate":"gt","@type":"P","value":{"@type":"int32","value":10}},{"predicate":"lt","@type":"P","value":{"@type":"int32","value":30}}]}],["out","knows"],["repeat",{"bytecode":[["out"],["has","~label",{"predicate":"eq","@type":"P","value":"person"}]],"@type":"Traversal"}],["times",{"@type":"int32","value":2}],["groupCount"],["by",{"@type":"T","value":"label"}]],"@type":"Traversal"}

Or in pretty print:
{
  "bytecode": [
[
  "V"
],
[
  "has",
  "age",
  {
"predicate": "and",
"@type": "P",
"value": [
  {
"predicate": "gt",
"@type": "P",
"value": {
  "@type": "int32",
  "value": 10
}
  },
  {
"predicate": "lt",
"@type": "P",
"value": {
  "@type": "int32",
  "value": 30
}
  }
]
  }
],
[
  "out",
  "knows"
],
[
  "repeat",
  {
"bytecode": [
  [
"out"
  ],
  [
"has",
"~label",
{
  "predicate": "eq",
  "@type": "P",
  "value": "person"
}
  ]
],
"@type": "Traversal"
  }
],
[
  "times",
  {
"@type": "int32",
"value": 2
  }
],
[
  "groupCount"
],
[
  "by",
  {
"@type": "T",
"value": "label"
  }
]
  ],
  "@type": "Traversal"
}

Thoughts?,
Marko.

http://markorodriguez.com



> On Jul 13, 2016, at 9:30 AM, Stephen Mallette  wrote:
> 
> Marko, the namespacing idea seems smart.
> 
> Robert, I think other graphs have similar behavior to TinkerGraph's
> default. In Titan, the absence of a schema (default, obviously) produces
> this:
> 
> gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
> ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> gremlin> graph.addVertex("n",100D)
> ==>v[4288]
> gremlin> graph.traversal().V().has('n',100f)
> gremlin> graph.traversal().V().has('n',100d)
> ==>v[4288]
> 
> This kind of problem has caused trouble for years and years in TinkerPop
> and allowing the type to be embedded seemed like a good solution. Of
> course, you bring up a good point about javascript - to this point we've
> relied on JS devs to conform to java/groovy types by forcing conversion in
> their gremlin scripts or configuring their graphs to avoid use of types
> that would produce these kinds of ambiguous results.
> 
> 
> 
> On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale  wrote:
> 
>> And just to be clear, I'm not necessarily disagreeing. But I think
>> it's important to understand where and why it's necessary.
>> 
>> For example, if I'm writing a gremlin script (string), I don't type my
>> input numbers.  It's rightly converted by the underlying architecture.
>> (I'm guessing groovy which has enhanced number support).  Also, if a
>> GLV is submitting typed numbers, how would that work? For example, in
>> Javascript?
>> 
>> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale  wrote:
>>> Hi, Stephen.  I think that's a bad example. You may recall I brought
>>> up that issue in the forum.  However, it's actually attributed to the
>>> default ID manager of ANY (for historical) which I think is a really
>>> bad default (and reason) because it only leads to confusion.  Java is
>>> one of the few, if not only, brain-damaged languages where 5 != 5 !=
>>> 5.  In Java, number objects must be coerced into like form for
>>> comparison. The other ID managers do this coercion.  Saner languages
>>> do this under the covers.
>>> 
>>> On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette 
>> wrote:
 Robert, thanks for joining this discussion.
 
> I wonder if it even makes sense to type numbers according to their
 memory model. As objects, Byte, Short, and Integer occupy the same
 space. Long isn't much more.  So in Java we're not saving much space.
 Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
 The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
 have this concept.  Does anything in gremlin actually require this?
 
 If the intended numeric type isn't preserved, weird things can happen
>> with
 graphs that have

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Stephen Mallette

Marko, the namespacing idea seems smart.

Robert, I think other graphs have similar behavior to TinkerGraph's
default. In Titan, the absence of a schema (default, obviously) produces
this:

gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> graph.addVertex("n",100D)
==>v[4288]
gremlin> graph.traversal().V().has('n',100f)
gremlin> graph.traversal().V().has('n',100d)
==>v[4288]

This kind of problem has caused trouble for years and years in TinkerPop
and allowing the type to be embedded seemed like a good solution. Of
course, you bring up a good point about javascript - to this point we've
relied on JS devs to conform to java/groovy types by forcing conversion in
their gremlin scripts or configuring their graphs to avoid use of types
that would produce these kinds of ambiguous results.



On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale  wrote:

> And just to be clear, I'm not necessarily disagreeing. But I think
> it's important to understand where and why it's necessary.
>
> For example, if I'm writing a gremlin script (string), I don't type my
> input numbers.  It's rightly converted by the underlying architecture.
> (I'm guessing groovy which has enhanced number support).  Also, if a
> GLV is submitting typed numbers, how would that work? For example, in
> Javascript?
>
> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale  wrote:
> > Hi, Stephen.  I think that's a bad example. You may recall I brought
> > up that issue in the forum.  However, it's actually attributed to the
> > default ID manager of ANY (for historical) which I think is a really
> > bad default (and reason) because it only leads to confusion.  Java is
> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
> > 5.  In Java, number objects must be coerced into like form for
> > comparison. The other ID managers do this coercion.  Saner languages
> > do this under the covers.
> >
> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette 
> wrote:
> >> Robert, thanks for joining this discussion.
> >>
> >>> I wonder if it even makes sense to type numbers according to their
> >> memory model. As objects, Byte, Short, and Integer occupy the same
> >> space. Long isn't much more.  So in Java we're not saving much space.
> >> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >> have this concept.  Does anything in gremlin actually require this?
> >>
> >> If the intended numeric type isn't preserved, weird things can happen
> with
> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph using the
> >> default ID manager will not be happy if you try to do a lookup of Long
> >> identifiers with an Integer:
> >>
> >> gremlin> graph = TinkerFactory.createModern()
> >> ==>tinkergraph[vertices:6 edges:6]
> >> gremlin> graph.vertices(1)
> >> ==>v[1]
> >> gremlin> graph.vertices(1L)
> >> gremlin>
> >>
> >>
> >>
> >>
> >> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale  wrote:
> >>
> >>> Marko, I agree that empty object properties should not be represented.
> >>> I think if you saw that in an example then it was probably for
> >>> demonstration purposes.
> >>>
> >>> Kevin, can you expand on this comment:
> >>>
> >>> > the format you suggest would lead to the same inconsistencies as in
> >>> GraphSON 1.0.
> >>> > Since the type is at the same level than the data itself, whether the
> >>> container is an Array or an Object
> >>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> >>>
> >>> What exactly are the inconsistencies?  What is the problem in
> >>> determining an array or object?
> >>> This is a natural JSON array (or list): []
> >>> This is a natural JSON object: {}
> >>>
> >>> Type at the object level is a common pattern and supported feature of
> >>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> >>> 'type' at the object level. Titan supports GeoJSON currently.  I
> >>> wonder if it would make sense to promote geometry to gremlin.
> >>>
> >>> We should probably start documenting a table of supported types. (If
> >>> there is one, please provide link)
> >>>
> >>> I wonder if it even makes sense to type numbers according to their
> >>> memory model. As objects, Byte, Short, and Integer occupy the same
> >>> space. Long isn't much more.  So in Java we're not saving much space.
> >>> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> >>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> >>> have this concept.  Does anything in gremlin actually require this?
> >>> I'm thinking that this is only going to be relevant at the domain
> >>> model level. This way json native numbers can be used and not need
> >>> typing.
> >>>
> >>> Additionally, I think that all things that will be typed should always
> >>> be typed. For the use cases of injesting a saved graph from a file, it
> >>> can proba

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Robert Dale

And just to be clear, I'm not necessarily disagreeing. But I think
it's important to understand where and why it's necessary.

For example, if I'm writing a gremlin script (string), I don't type my
input numbers.  It's rightly converted by the underlying architecture.
(I'm guessing groovy which has enhanced number support).  Also, if a
GLV is submitting typed numbers, how would that work? For example, in
Javascript?

On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale  wrote:
> Hi, Stephen.  I think that's a bad example. You may recall I brought
> up that issue in the forum.  However, it's actually attributed to the
> default ID manager of ANY (for historical) which I think is a really
> bad default (and reason) because it only leads to confusion.  Java is
> one of the few, if not only, brain-damaged languages where 5 != 5 !=
> 5.  In Java, number objects must be coerced into like form for
> comparison. The other ID managers do this coercion.  Saner languages
> do this under the covers.
>
> On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette  
> wrote:
>> Robert, thanks for joining this discussion.
>>
>>> I wonder if it even makes sense to type numbers according to their
>> memory model. As objects, Byte, Short, and Integer occupy the same
>> space. Long isn't much more.  So in Java we're not saving much space.
>> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
>> have this concept.  Does anything in gremlin actually require this?
>>
>> If the intended numeric type isn't preserved, weird things can happen with
>> graphs that have a schema (like Titan/DSE). Even TinkerGraph using the
>> default ID manager will not be happy if you try to do a lookup of Long
>> identifiers with an Integer:
>>
>> gremlin> graph = TinkerFactory.createModern()
>> ==>tinkergraph[vertices:6 edges:6]
>> gremlin> graph.vertices(1)
>> ==>v[1]
>> gremlin> graph.vertices(1L)
>> gremlin>
>>
>>
>>
>>
>> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale  wrote:
>>
>>> Marko, I agree that empty object properties should not be represented.
>>> I think if you saw that in an example then it was probably for
>>> demonstration purposes.
>>>
>>> Kevin, can you expand on this comment:
>>>
>>> > the format you suggest would lead to the same inconsistencies as in
>>> GraphSON 1.0.
>>> > Since the type is at the same level than the data itself, whether the
>>> container is an Array or an Object
>>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
>>>
>>> What exactly are the inconsistencies?  What is the problem in
>>> determining an array or object?
>>> This is a natural JSON array (or list): []
>>> This is a natural JSON object: {}
>>>
>>> Type at the object level is a common pattern and supported feature of
>>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
>>> 'type' at the object level. Titan supports GeoJSON currently.  I
>>> wonder if it would make sense to promote geometry to gremlin.
>>>
>>> We should probably start documenting a table of supported types. (If
>>> there is one, please provide link)
>>>
>>> I wonder if it even makes sense to type numbers according to their
>>> memory model. As objects, Byte, Short, and Integer occupy the same
>>> space. Long isn't much more.  So in Java we're not saving much space.
>>> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
>>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
>>> have this concept.  Does anything in gremlin actually require this?
>>> I'm thinking that this is only going to be relevant at the domain
>>> model level. This way json native numbers can be used and not need
>>> typing.
>>>
>>> Additionally, I think that all things that will be typed should always
>>> be typed. For the use cases of injesting a saved graph from a file, it
>>> can probably be assumed that the top-level objects are vertices since
>>> the graph is vertex-centric and everything else follows naturally.
>>> I'm not entirely sure what is required for submitting traversals to
>>> gremlin server from GLV.  However, if this is used for the results
>>> from gremlin server then the results could start with any one of path,
>>> vertex, edge, property, vertex property, etc. So you'll need that type
>>> data there.
>>>
>>> --
>>> Robert Dale
>>>
>>> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez 
>>> wrote:
>>> > Hi,
>>> >
>>> > I’m not following this PR too closely so what I might be saying is a
>>> already known/argued against/etc.
>>> >
>>> > 1. I think we should go with Robert Dale’s proposal of int32,
>>> int64, Vertex, uuid, etc. instead of Java class names.
>>> > 2. In Java we then have a Map for typecasting
>>> accordingly.
>>> > 3. This would make GraphSON 2.0 perfect for Bytecode
>>> serialization in TINKERPOP-1278.
>>> > 4. I think that if a Vertex, Edge, etc. doesn’t have properties,
>>> outV, etc. then don’t even hav

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Robert Dale

Hi, Stephen.  I think that's a bad example. You may recall I brought
up that issue in the forum.  However, it's actually attributed to the
default ID manager of ANY (for historical) which I think is a really
bad default (and reason) because it only leads to confusion.  Java is
one of the few, if not only, brain-damaged languages where 5 != 5 !=
5.  In Java, number objects must be coerced into like form for
comparison. The other ID managers do this coercion.  Saner languages
do this under the covers.

On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette  wrote:
> Robert, thanks for joining this discussion.
>
>> I wonder if it even makes sense to type numbers according to their
> memory model. As objects, Byte, Short, and Integer occupy the same
> space. Long isn't much more.  So in Java we're not saving much space.
> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> have this concept.  Does anything in gremlin actually require this?
>
> If the intended numeric type isn't preserved, weird things can happen with
> graphs that have a schema (like Titan/DSE). Even TinkerGraph using the
> default ID manager will not be happy if you try to do a lookup of Long
> identifiers with an Integer:
>
> gremlin> graph = TinkerFactory.createModern()
> ==>tinkergraph[vertices:6 edges:6]
> gremlin> graph.vertices(1)
> ==>v[1]
> gremlin> graph.vertices(1L)
> gremlin>
>
>
>
>
> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale  wrote:
>
>> Marko, I agree that empty object properties should not be represented.
>> I think if you saw that in an example then it was probably for
>> demonstration purposes.
>>
>> Kevin, can you expand on this comment:
>>
>> > the format you suggest would lead to the same inconsistencies as in
>> GraphSON 1.0.
>> > Since the type is at the same level than the data itself, whether the
>> container is an Array or an Object
>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
>>
>> What exactly are the inconsistencies?  What is the problem in
>> determining an array or object?
>> This is a natural JSON array (or list): []
>> This is a natural JSON object: {}
>>
>> Type at the object level is a common pattern and supported feature of
>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
>> 'type' at the object level. Titan supports GeoJSON currently.  I
>> wonder if it would make sense to promote geometry to gremlin.
>>
>> We should probably start documenting a table of supported types. (If
>> there is one, please provide link)
>>
>> I wonder if it even makes sense to type numbers according to their
>> memory model. As objects, Byte, Short, and Integer occupy the same
>> space. Long isn't much more.  So in Java we're not saving much space.
>> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
>> have this concept.  Does anything in gremlin actually require this?
>> I'm thinking that this is only going to be relevant at the domain
>> model level. This way json native numbers can be used and not need
>> typing.
>>
>> Additionally, I think that all things that will be typed should always
>> be typed. For the use cases of injesting a saved graph from a file, it
>> can probably be assumed that the top-level objects are vertices since
>> the graph is vertex-centric and everything else follows naturally.
>> I'm not entirely sure what is required for submitting traversals to
>> gremlin server from GLV.  However, if this is used for the results
>> from gremlin server then the results could start with any one of path,
>> vertex, edge, property, vertex property, etc. So you'll need that type
>> data there.
>>
>> --
>> Robert Dale
>>
>> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez 
>> wrote:
>> > Hi,
>> >
>> > I’m not following this PR too closely so what I might be saying is a
>> already known/argued against/etc.
>> >
>> > 1. I think we should go with Robert Dale’s proposal of int32,
>> int64, Vertex, uuid, etc. instead of Java class names.
>> > 2. In Java we then have a Map for typecasting
>> accordingly.
>> > 3. This would make GraphSON 2.0 perfect for Bytecode
>> serialization in TINKERPOP-1278.
>> > 4. I think that if a Vertex, Edge, etc. doesn’t have properties,
>> outV, etc. then don’t even have those fields in the representation.
>> > 5. Most of the serialization back and forth will be ReferenceXXX
>> elements and thus, don’t create more Maps/lists for no reason. — less chars.
>> >
>> > For me, my interests with this work is all about a language agnostic way
>> of sending Gremlin traversal bytecode between different languages. This
>> work is exactly what I am looking for.
>> >
>> > Thanks,
>> > Marko.
>> >
>> > http://markorodriguez.com
>> >
>> >
>> >
>> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette 
>> wrote:
>> >>
>> >> With all the work on GLVs and the

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Marko Rodriguez

Hi,

Sorry for the double-email. Let me add some more notes about Gremlin Bytecode.

Instructions that only have primitives are easy:

has name marko

Instructions that have enums require typing:

has T.label person

Instructions that have traversals require nesting:

repeat [ out knows ]

Instructions that have P predicates require typing/nesting:

has age P.gt(10).and(P.lt(32))

From here, I think we will need:

1. Enum types for Direction, Cardinality, Order, Operator, Pop, Scope, T
{@type:”Operator”, value:”sum” }
2. Traversal type for anonymous traversal nesting.
{@type:”Traversal”, bytecode: [ ] }
3. P type with nesting.
{@type: “P”, predicate: “and”, value: [ {@type: “P”, predicate: 
“gt”, value: 10}, {@type: “P”, predicate: “lt”, value: 32} ]}

Thus, lets do an example:

g.V().has(“age”,gt(30)).repeat(out()).times(5).order().by(“income”,decr)

{
  @type: “Traversal”,
  bytecode: [
[“V”],
[“has”, “age”, { @type: “P”, predicate: “gt”, value: {@type: “int32”, 
value: 30} }],
[“repeat”, {@type: “Traversal”, bytecode : [ [“out”] ] }],
[“times”, {@type: “int32”, value: 5 }],
[“order”],
[“by”, “income”, {@type: “Order”, value: “decr” }]
  ]
}

Something like that. Question:

Do we namespace our types? 
@type: “gremlin:Traversal"
@type: “gremlin:P”

From a JSON representation like this, we can then “reply the instructions” on a 
Gremlin VM to reconstruct the traversal that was generated remotely in another 
language. In TINKERPOP-1278, Gremlin-Python creates a Bytecode object as the 
user is doing g.V.out… then on next(), toList(), etc. that Bytecode object is 
sent to a RemoteConnection (e.g. GremlinServer), the Bytecode is run on the 
remote VM to create g.V.out… traversal remotely, execute it, and return results 
in GraphSON.

Thats all there is to it.

Marko. 

http://markorodriguez.com



> On Jul 13, 2016, at 6:53 AM, Marko Rodriguez  wrote:
> 
> Hello,
> 
>> I wonder if it even makes sense to type numbers according to their
>> memory model. As objects, Byte, Short, and Integer occupy the same
>> space. Long isn't much more.  So in Java we're not saving much space.
>> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
>> have this concept.  Does anything in gremlin actually require this?
>> I'm thinking that this is only going to be relevant at the domain
>> model level. This way json native numbers can be used and not need
>> typing.
> 
> I think we should type numbers. Already, with Gremlin-Python, I’m having 
> trouble between Python’s “big integer”, Java’s long, Java’s float, Python’s 
> double…. knowing the size and schema of a number is important. However, I 
> believe that if something is NOT typed, then we assume it is a JSON-type. 
> E.g. String, boolean.
> 
>> Additionally, I think that all things that will be typed should always
>> be typed. For the use cases of injesting a saved graph from a file, it
>> can probably be assumed that the top-level objects are vertices since
>> the graph is vertex-centric and everything else follows naturally.
>> I'm not entirely sure what is required for submitting traversals to
>> gremlin server from GLV.  However, if this is used for the results
>> from gremlin server then the results could start with any one of path,
>> vertex, edge, property, vertex property, etc. So you'll need that type
>> data there.
> 
> Gremlin-XXX will be sending Bytecode to Gremlin-YYY. Byte code looks like 
> this:
> 
> V 1
> out knows
> has name marko
> has age gt(10)
> values name
> 
> Here is a thrown together JSON representation of 
> 
> g.V().as("a").repeat(out("created", 
> "knows")).times(2).as("b").dedup().select("a", "b”)
> 
> [["V"],
> ["as","a"],
> ["repeat",[["out","created","knows"]]],
> ["times",2],
> ["as","b"],
> ["dedup"],
> ["select","a","b”]]
> 
> We will need a GraphSON representation of this that is Java free. This will 
> be the foundational representation of serialized Gremlin traversals that can 
> be passed around between Gremlin VMs and used by any programming language to 
> both host Gremlin and compile traversals to this JSON-based Bytecode 
> representation.
> 
> Marko.

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Stephen Mallette

Robert, thanks for joining this discussion.

> I wonder if it even makes sense to type numbers according to their
memory model. As objects, Byte, Short, and Integer occupy the same
space. Long isn't much more.  So in Java we're not saving much space.
Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
have this concept.  Does anything in gremlin actually require this?

If the intended numeric type isn't preserved, weird things can happen with
graphs that have a schema (like Titan/DSE). Even TinkerGraph using the
default ID manager will not be happy if you try to do a lookup of Long
identifiers with an Integer:

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> graph.vertices(1)
==>v[1]
gremlin> graph.vertices(1L)
gremlin>




On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale  wrote:

> Marko, I agree that empty object properties should not be represented.
> I think if you saw that in an example then it was probably for
> demonstration purposes.
>
> Kevin, can you expand on this comment:
>
> > the format you suggest would lead to the same inconsistencies as in
> GraphSON 1.0.
> > Since the type is at the same level than the data itself, whether the
> container is an Array or an Object
> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
>
> What exactly are the inconsistencies?  What is the problem in
> determining an array or object?
> This is a natural JSON array (or list): []
> This is a natural JSON object: {}
>
> Type at the object level is a common pattern and supported feature of
> Jackson.  Also, GeoJSON would be a natural fit as it also stores
> 'type' at the object level. Titan supports GeoJSON currently.  I
> wonder if it would make sense to promote geometry to gremlin.
>
> We should probably start documenting a table of supported types. (If
> there is one, please provide link)
>
> I wonder if it even makes sense to type numbers according to their
> memory model. As objects, Byte, Short, and Integer occupy the same
> space. Long isn't much more.  So in Java we're not saving much space.
> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> have this concept.  Does anything in gremlin actually require this?
> I'm thinking that this is only going to be relevant at the domain
> model level. This way json native numbers can be used and not need
> typing.
>
> Additionally, I think that all things that will be typed should always
> be typed. For the use cases of injesting a saved graph from a file, it
> can probably be assumed that the top-level objects are vertices since
> the graph is vertex-centric and everything else follows naturally.
> I'm not entirely sure what is required for submitting traversals to
> gremlin server from GLV.  However, if this is used for the results
> from gremlin server then the results could start with any one of path,
> vertex, edge, property, vertex property, etc. So you'll need that type
> data there.
>
> --
> Robert Dale
>
> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez 
> wrote:
> > Hi,
> >
> > I’m not following this PR too closely so what I might be saying is a
> already known/argued against/etc.
> >
> > 1. I think we should go with Robert Dale’s proposal of int32,
> int64, Vertex, uuid, etc. instead of Java class names.
> > 2. In Java we then have a Map for typecasting
> accordingly.
> > 3. This would make GraphSON 2.0 perfect for Bytecode
> serialization in TINKERPOP-1278.
> > 4. I think that if a Vertex, Edge, etc. doesn’t have properties,
> outV, etc. then don’t even have those fields in the representation.
> > 5. Most of the serialization back and forth will be ReferenceXXX
> elements and thus, don’t create more Maps/lists for no reason. — less chars.
> >
> > For me, my interests with this work is all about a language agnostic way
> of sending Gremlin traversal bytecode between different languages. This
> work is exactly what I am looking for.
> >
> > Thanks,
> > Marko.
> >
> > http://markorodriguez.com
> >
> >
> >
> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette 
> wrote:
> >>
> >> With all the work on GLVs and the recent work on GraphSON 2.0, I think
> it's
> >> important that we have a solid, efficient, programming language neutral,
> >> lossless serialization format. Right now that format is GraphSON and it
> >> works for that purpose (ever more  so with 2.0). Given some discussion
> on
> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >>
> >> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >>
> >> I wonder if we shouldn't consider another IO format that has Gremlin
> >> Server/GLVs in mind. At this point I'm not suggesting anything specific
> -
> >> I'm just hanging the idea out for further discussion and brain storming.
> >> Thoughts?
> >
>
>
>
> --
> Robert

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Marko Rodriguez

Hello,

> I wonder if it even makes sense to type numbers according to their
> memory model. As objects, Byte, Short, and Integer occupy the same
> space. Long isn't much more.  So in Java we're not saving much space.
> Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
> The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
> have this concept.  Does anything in gremlin actually require this?
> I'm thinking that this is only going to be relevant at the domain
> model level. This way json native numbers can be used and not need
> typing.

I think we should type numbers. Already, with Gremlin-Python, I’m having 
trouble between Python’s “big integer”, Java’s long, Java’s float, Python’s 
double…. knowing the size and schema of a number is important. However, I 
believe that if something is NOT typed, then we assume it is a JSON-type. E.g. 
String, boolean.

> Additionally, I think that all things that will be typed should always
> be typed. For the use cases of injesting a saved graph from a file, it
> can probably be assumed that the top-level objects are vertices since
> the graph is vertex-centric and everything else follows naturally.
> I'm not entirely sure what is required for submitting traversals to
> gremlin server from GLV.  However, if this is used for the results
> from gremlin server then the results could start with any one of path,
> vertex, edge, property, vertex property, etc. So you'll need that type
> data there.

Gremlin-XXX will be sending Bytecode to Gremlin-YYY. Byte code looks like this:

V 1
out knows
has name marko
has age gt(10)
values name

Here is a thrown together JSON representation of 

g.V().as("a").repeat(out("created", 
"knows")).times(2).as("b").dedup().select("a", "b”)

[["V"],
["as","a"],
["repeat",[["out","created","knows"]]],
["times",2],
["as","b"],
["dedup"],
["select","a","b”]]

We will need a GraphSON representation of this that is Java free. This will be 
the foundational representation of serialized Gremlin traversals that can be 
passed around between Gremlin VMs and used by any programming language to both 
host Gremlin and compile traversals to this JSON-based Bytecode representation.

Marko.

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-13 Thread Robert Dale

Marko, I agree that empty object properties should not be represented.
I think if you saw that in an example then it was probably for
demonstration purposes.

Kevin, can you expand on this comment:

> the format you suggest would lead to the same inconsistencies as in GraphSON 
> 1.0.
> Since the type is at the same level than the data itself, whether the 
> container is an Array or an Object
> https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653

What exactly are the inconsistencies?  What is the problem in
determining an array or object?
This is a natural JSON array (or list): []
This is a natural JSON object: {}

Type at the object level is a common pattern and supported feature of
Jackson.  Also, GeoJSON would be a natural fit as it also stores
'type' at the object level. Titan supports GeoJSON currently.  I
wonder if it would make sense to promote geometry to gremlin.

We should probably start documenting a table of supported types. (If
there is one, please provide link)

I wonder if it even makes sense to type numbers according to their
memory model. As objects, Byte, Short, and Integer occupy the same
space. Long isn't much more.  So in Java we're not saving much space.
Jackson will attempt to parse in order: int, long, BigInt, BigDecimal.
The JSON JSR uses only BigDecimal. Some non-jvm languages don't even
have this concept.  Does anything in gremlin actually require this?
I'm thinking that this is only going to be relevant at the domain
model level. This way json native numbers can be used and not need
typing.

Additionally, I think that all things that will be typed should always
be typed. For the use cases of injesting a saved graph from a file, it
can probably be assumed that the top-level objects are vertices since
the graph is vertex-centric and everything else follows naturally.
I'm not entirely sure what is required for submitting traversals to
gremlin server from GLV.  However, if this is used for the results
from gremlin server then the results could start with any one of path,
vertex, edge, property, vertex property, etc. So you'll need that type
data there.

-- 
Robert Dale

On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez  wrote:
> Hi,
>
> I’m not following this PR too closely so what I might be saying is a already 
> known/argued against/etc.
>
> 1. I think we should go with Robert Dale’s proposal of int32, int64, 
> Vertex, uuid, etc. instead of Java class names.
> 2. In Java we then have a Map for typecasting 
> accordingly.
> 3. This would make GraphSON 2.0 perfect for Bytecode serialization in 
> TINKERPOP-1278.
> 4. I think that if a Vertex, Edge, etc. doesn’t have properties, 
> outV, etc. then don’t even have those fields in the representation.
> 5. Most of the serialization back and forth will be ReferenceXXX 
> elements and thus, don’t create more Maps/lists for no reason. — less chars.
>
> For me, my interests with this work is all about a language agnostic way of 
> sending Gremlin traversal bytecode between different languages. This work is 
> exactly what I am looking for.
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
>
>
>> On Jul 9, 2016, at 9:48 AM, Stephen Mallette  wrote:
>>
>> With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
>> important that we have a solid, efficient, programming language neutral,
>> lossless serialization format. Right now that format is GraphSON and it
>> works for that purpose (ever more  so with 2.0). Given some discussion on
>> the GraphSON 2.0 PR driven a bit by Robert Dale:
>>
>> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
>>
>> I wonder if we shouldn't consider another IO format that has Gremlin
>> Server/GLVs in mind. At this point I'm not suggesting anything specific -
>> I'm just hanging the idea out for further discussion and brain storming.
>> Thoughts?
>

-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-12 Thread Marko Rodriguez

Hi,

I’m not following this PR too closely so what I might be saying is a already 
known/argued against/etc.

1. I think we should go with Robert Dale’s proposal of int32, int64, 
Vertex, uuid, etc. instead of Java class names.
2. In Java we then have a Map for typecasting accordingly.
3. This would make GraphSON 2.0 perfect for Bytecode serialization in 
TINKERPOP-1278.
4. I think that if a Vertex, Edge, etc. doesn’t have properties, outV, 
etc. then don’t even have those fields in the representation.
5. Most of the serialization back and forth will be ReferenceXXX 
elements and thus, don’t create more Maps/lists for no reason. — less chars.

For me, my interests with this work is all about a language agnostic way of 
sending Gremlin traversal bytecode between different languages. This work is 
exactly what I am looking for.

Thanks,
Marko.

http://markorodriguez.com

> On Jul 9, 2016, at 9:48 AM, Stephen Mallette  wrote:
> 
> With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
> important that we have a solid, efficient, programming language neutral,
> lossless serialization format. Right now that format is GraphSON and it
> works for that purpose (ever more  so with 2.0). Given some discussion on
> the GraphSON 2.0 PR driven a bit by Robert Dale:
> 
> https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> 
> I wonder if we shouldn't consider another IO format that has Gremlin
> Server/GLVs in mind. At this point I'm not suggesting anything specific -
> I'm just hanging the idea out for further discussion and brain storming.
> Thoughts?

[DISCUSS] New IO format for GLVs/Gremlin Server

2016-07-09 Thread Stephen Mallette

With all the work on GLVs and the recent work on GraphSON 2.0, I think it's
important that we have a solid, efficient, programming language neutral,
lossless serialization format. Right now that format is GraphSON and it
works for that purpose (ever more  so with 2.0). Given some discussion on
the GraphSON 2.0 PR driven a bit by Robert Dale:

https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389

I wonder if we shouldn't consider another IO format that has Gremlin
Server/GLVs in mind. At this point I'm not suggesting anything specific -
I'm just hanging the idea out for further discussion and brain storming.
Thoughts?

40 matches

Mail list logo