[ 
https://issues.apache.org/jira/browse/TINKERPOP-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355048#comment-15355048
 ] 

ASF GitHub Bot commented on TINKERPOP-1274:
-------------------------------------------

GitHub user newkek opened a pull request:

    https://github.com/apache/tinkerpop/pull/351

    TINKERPOP-1274: GraphSON 2.0.

    https://issues.apache.org/jira/browse/TINKERPOP-1274
    
    # Summary of the changes : 
    
    Implementation of a format for value types serialization which is uniform 
and not Java centric. As a reminder the new format is as follows : 
    - A value not typed : `value`
    - A value typed : `[{"@class":"typeName"}, value]`
    
    The content of `value` can be either a simple value, or a more complex 
structure. The type prefix allows to call the right _Deserializer_ to 
deserialize the `value` whatever its content is.
    
    The default's GraphSON's 2.0 format will include types for every `value` 
that is not a JSON native type 
(`String/int/double/null/boolean/Map/Collection`) - called `PARTIAL_TYPES`. 
This allows to significantly reduce the size of the JSON payload where those 
aren't needed by the Jackson library.
    
    GraphSON serialization _without_ types is not affected. To enable it, need 
to use `NO_TYPES`.
    
    ## Quick walkthrough for the review
    
    There are new components involved in the ser/de process that are extended 
from the Jackson library :
    - _TypeIdResolver_ : performs the conversion of a object's `Class` -> 
`typeID`, and from a `typeID` -> `Class`.
    
    - _TypeResolverBuilder_ : creates a _TypeIdResolver_ and instantiates a 
_TypeDeserializer_ or _TypeSerializer_ with the _TypeIdResolver_ as a param.
    
    - _TypeSerializer_ : writes the prefix and suffix for a typeID. The 
_TypeSerializer_ is provided to the _Serializers_ and handles the type 
serialization that respects the format put in place.
    
    - _TypeDeserializer_ : is called before the deserializers are called. Its 
role is to detect a type, and call the right _Deserializer_ to deserialize the 
value.
    
    Most of the serializers already existing for Graph had to be modified, 
because most of them were *manually hardcoding* types without calling the 
_TypeSerializer_ and hence, those were not respecting the format. Now all 
Serializers respect the format because they call the _TypeSerializer_ given in 
parameter. I followed the frame put in place by @spmallette to implement those 
new serializers (prefixed with `V2d0`) without breaking existing clients code.
    
    In _TypeDeserializer_, `baseType` represents the class given in parameter 
by the user for the ser/de. If the user calls 
`mapper.readValueAsString(jsonString, UUID.class)` the `baseType` in 
_TypeDeserializer_ will be a _JavaType_ (which is the Jackson's custom class 
for Java classes) that represents a _UUID_ class. The *wildcard* in our 
deserialization mechanism is `Object.class`. 
    
    The deserialization path is as follow : 
    - _TypeDeserializer_ is called only for non simple JSON values (non 
`String/int/double/null/boolean`).
    - When called, if a type is detected, the _TypeDeserializer_ will read the 
typeID, convert it to a JavaType (thanks to the _TypeIdResolver_), check that 
the `baseType` is the same than what was read in the payload (only if 
`baseType` is not the *wildcard*), and call the `deserialize()` method of the 
_JsonDeserializer_ registered for that type.
    - If a type is not detected, detect Maps or Arrays and call appropriate 
Deserializer.
    
    ## Some results 
    
    GraphSON 2.0 shows a significant reduction of the payload's size for typed 
serialization. And the consequence in performance is that since there's less to 
process, the ser/de is faster. Results show a reduction of at least 50% in the 
payload's size and evolving linearly (the bigger the payload the bigger the 
difference) : 
    
    ```
    ➜  ls -lh 
tinkergraph-gremlin/target/test-case-data/TinkerGraphTest/tinkerpop-io/
    -rw-r--r--  1 kevingallardo  staff   890K 28 Jun 21:50 
grateful-dead-V2d0-typed.json
    -rw-r--r--  1 kevingallardo  staff   1.9M 28 Jun 21:50 
grateful-dead-typed.json
    -rw-r--r--  1 kevingallardo  staff   851K 28 Jun 21:50 grateful-dead.json
    -rw-r--r--  1 kevingallardo  staff   1.5K 28 Jun 21:50 
tinkerpop-classic-V2d0-typed.json
    -rw-r--r--  1 kevingallardo  staff   3.6K 28 Jun 21:50 
tinkerpop-classic-typed.json
    -rw-r--r--  1 kevingallardo  staff   1.3K 28 Jun 21:50 
tinkerpop-classic.json
    ```
    
    ## Tests
    
    Tests cover the same functionalities covered by GraphSON 1.0 typed, plus 
additional features brought by GraphSON 2.0. 
    
    ## Tradeoffs
    
    - Some tricks were implemented in order to provide the types without 
packages names. Since Java does not provide a way to search a class by its 
simple name, the `TypeIdResolver` has to have an index that it can refer to, 
and that has been correctly filled. The _TinkerpopJackosnModule_ class will 
handle providing those new indexes to the _TypeIdResolver_ when custom 
_Deserializers_ are added, but _without_ doing that we have no way to properly 
convert a type. We then require that a user instead of extending Jackson's 
_SimpleModule_ when writing a module, extends _TinkerpopJacksonModule_ which is 
an extension the Jackson _SimpleModule_. We can discuss whether this is a 
showstopper or if it is acceptable. Some solutions to prevent that, could be 
using a external library that allows searching a Class by its simple name, or 
switch to writing typeIDs with the full Class's canonical name. I don't have a 
strong opinion but it seems like the current solution is simple and good enough.
    
    - We loose some polymorphism potential in regards to POJOs without 
deserializers. Since the approach here is simpler and faster, it doesn't 
inspects JavaTypes to find potential subclasses. On the filp side it allows 
fine tuning of the Deserializers chosen to deserialize a type. I also thought 
about making the _TypeIdResolver_ exposed to users through the GraphSONMapper 
for full customization of the types ser/de. Could be done easily later.
    
    - There is 1 situation that can lead to unexpected results : if somebody 
serializes a `value` that has the exact same format as the type format. I.e. a 
List in which the _first_ element is a Map in which the _first_ entry's key is 
GraphSONTokens.CLASS. We may want to highlight that somewhere.
    
    ## Extra facts
    
    - The class _JsonParserConcat_ allows some great perf improvements in the 
deser path and has been implemented and inspired by the _JsonParserSequence_ 
class from Jackson, however the Jackson's class has an unexpected behaviour, 
which is corrected in _JsonParserConcat_. _JsonParserSequence_ may be corrected 
in future Jackson versions but in the meantime _JsonParserConcat_ will do fine. 
(http://jackson-users.ning.com/forum/topics/jsonparsersequence-behaviour-seems-misleading)
    
    - `FULL_TYPES` has not been implemented. FULL_TYPES means writing types for 
JSON natively supported values. However all the mechanism is put in place to 
implement it easily if we deem it necessary. I don't see the necessity for this 
now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/newkek/incubator-tinkerpop TINKERPOP-1274

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/351.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #351
    
----
commit f5b64fafb071fffc60fdb5113709c78f5dde334f
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2016-05-18T12:41:26Z

    Frame up for GraphSON 2.0

commit 0b805d600b9b11b0b36bb34ebfe4c880615e7764
Author: Kevin Gallardo <kevin.galla...@datastax.com>
Date:   2016-06-28T18:38:32Z

    TINKERPOP-1274: GraphSON 2.0.

----


> GraphSON Version 2.0
> --------------------
>
>                 Key: TINKERPOP-1274
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1274
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 3.1.2-incubating
>            Reporter: stephen mallette
>            Priority: Minor
>             Fix For: 3.2.1
>
>
> Develop a revised version of GraphSON that provides better support for 
> non-JVM languages that consume it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to