[ https://issues.apache.org/jira/browse/TINKERPOP-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355048#comment-15355048 ]
ASF GitHub Bot commented on TINKERPOP-1274: ------------------------------------------- GitHub user newkek opened a pull request: https://github.com/apache/tinkerpop/pull/351 TINKERPOP-1274: GraphSON 2.0. https://issues.apache.org/jira/browse/TINKERPOP-1274 # Summary of the changes : Implementation of a format for value types serialization which is uniform and not Java centric. As a reminder the new format is as follows : - A value not typed : `value` - A value typed : `[{"@class":"typeName"}, value]` The content of `value` can be either a simple value, or a more complex structure. The type prefix allows to call the right _Deserializer_ to deserialize the `value` whatever its content is. The default's GraphSON's 2.0 format will include types for every `value` that is not a JSON native type (`String/int/double/null/boolean/Map/Collection`) - called `PARTIAL_TYPES`. This allows to significantly reduce the size of the JSON payload where those aren't needed by the Jackson library. GraphSON serialization _without_ types is not affected. To enable it, need to use `NO_TYPES`. ## Quick walkthrough for the review There are new components involved in the ser/de process that are extended from the Jackson library : - _TypeIdResolver_ : performs the conversion of a object's `Class` -> `typeID`, and from a `typeID` -> `Class`. - _TypeResolverBuilder_ : creates a _TypeIdResolver_ and instantiates a _TypeDeserializer_ or _TypeSerializer_ with the _TypeIdResolver_ as a param. - _TypeSerializer_ : writes the prefix and suffix for a typeID. The _TypeSerializer_ is provided to the _Serializers_ and handles the type serialization that respects the format put in place. - _TypeDeserializer_ : is called before the deserializers are called. Its role is to detect a type, and call the right _Deserializer_ to deserialize the value. Most of the serializers already existing for Graph had to be modified, because most of them were *manually hardcoding* types without calling the _TypeSerializer_ and hence, those were not respecting the format. Now all Serializers respect the format because they call the _TypeSerializer_ given in parameter. I followed the frame put in place by @spmallette to implement those new serializers (prefixed with `V2d0`) without breaking existing clients code. In _TypeDeserializer_, `baseType` represents the class given in parameter by the user for the ser/de. If the user calls `mapper.readValueAsString(jsonString, UUID.class)` the `baseType` in _TypeDeserializer_ will be a _JavaType_ (which is the Jackson's custom class for Java classes) that represents a _UUID_ class. The *wildcard* in our deserialization mechanism is `Object.class`. The deserialization path is as follow : - _TypeDeserializer_ is called only for non simple JSON values (non `String/int/double/null/boolean`). - When called, if a type is detected, the _TypeDeserializer_ will read the typeID, convert it to a JavaType (thanks to the _TypeIdResolver_), check that the `baseType` is the same than what was read in the payload (only if `baseType` is not the *wildcard*), and call the `deserialize()` method of the _JsonDeserializer_ registered for that type. - If a type is not detected, detect Maps or Arrays and call appropriate Deserializer. ## Some results GraphSON 2.0 shows a significant reduction of the payload's size for typed serialization. And the consequence in performance is that since there's less to process, the ser/de is faster. Results show a reduction of at least 50% in the payload's size and evolving linearly (the bigger the payload the bigger the difference) : ``` ➜ ls -lh tinkergraph-gremlin/target/test-case-data/TinkerGraphTest/tinkerpop-io/ -rw-r--r-- 1 kevingallardo staff 890K 28 Jun 21:50 grateful-dead-V2d0-typed.json -rw-r--r-- 1 kevingallardo staff 1.9M 28 Jun 21:50 grateful-dead-typed.json -rw-r--r-- 1 kevingallardo staff 851K 28 Jun 21:50 grateful-dead.json -rw-r--r-- 1 kevingallardo staff 1.5K 28 Jun 21:50 tinkerpop-classic-V2d0-typed.json -rw-r--r-- 1 kevingallardo staff 3.6K 28 Jun 21:50 tinkerpop-classic-typed.json -rw-r--r-- 1 kevingallardo staff 1.3K 28 Jun 21:50 tinkerpop-classic.json ``` ## Tests Tests cover the same functionalities covered by GraphSON 1.0 typed, plus additional features brought by GraphSON 2.0. ## Tradeoffs - Some tricks were implemented in order to provide the types without packages names. Since Java does not provide a way to search a class by its simple name, the `TypeIdResolver` has to have an index that it can refer to, and that has been correctly filled. The _TinkerpopJackosnModule_ class will handle providing those new indexes to the _TypeIdResolver_ when custom _Deserializers_ are added, but _without_ doing that we have no way to properly convert a type. We then require that a user instead of extending Jackson's _SimpleModule_ when writing a module, extends _TinkerpopJacksonModule_ which is an extension the Jackson _SimpleModule_. We can discuss whether this is a showstopper or if it is acceptable. Some solutions to prevent that, could be using a external library that allows searching a Class by its simple name, or switch to writing typeIDs with the full Class's canonical name. I don't have a strong opinion but it seems like the current solution is simple and good enough. - We loose some polymorphism potential in regards to POJOs without deserializers. Since the approach here is simpler and faster, it doesn't inspects JavaTypes to find potential subclasses. On the filp side it allows fine tuning of the Deserializers chosen to deserialize a type. I also thought about making the _TypeIdResolver_ exposed to users through the GraphSONMapper for full customization of the types ser/de. Could be done easily later. - There is 1 situation that can lead to unexpected results : if somebody serializes a `value` that has the exact same format as the type format. I.e. a List in which the _first_ element is a Map in which the _first_ entry's key is GraphSONTokens.CLASS. We may want to highlight that somewhere. ## Extra facts - The class _JsonParserConcat_ allows some great perf improvements in the deser path and has been implemented and inspired by the _JsonParserSequence_ class from Jackson, however the Jackson's class has an unexpected behaviour, which is corrected in _JsonParserConcat_. _JsonParserSequence_ may be corrected in future Jackson versions but in the meantime _JsonParserConcat_ will do fine. (http://jackson-users.ning.com/forum/topics/jsonparsersequence-behaviour-seems-misleading) - `FULL_TYPES` has not been implemented. FULL_TYPES means writing types for JSON natively supported values. However all the mechanism is put in place to implement it easily if we deem it necessary. I don't see the necessity for this now. You can merge this pull request into a Git repository by running: $ git pull https://github.com/newkek/incubator-tinkerpop TINKERPOP-1274 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/351.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #351 ---- commit f5b64fafb071fffc60fdb5113709c78f5dde334f Author: Stephen Mallette <sp...@genoprime.com> Date: 2016-05-18T12:41:26Z Frame up for GraphSON 2.0 commit 0b805d600b9b11b0b36bb34ebfe4c880615e7764 Author: Kevin Gallardo <kevin.galla...@datastax.com> Date: 2016-06-28T18:38:32Z TINKERPOP-1274: GraphSON 2.0. ---- > GraphSON Version 2.0 > -------------------- > > Key: TINKERPOP-1274 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1274 > Project: TinkerPop > Issue Type: Improvement > Components: io > Affects Versions: 3.1.2-incubating > Reporter: stephen mallette > Priority: Minor > Fix For: 3.2.1 > > > Develop a revised version of GraphSON that provides better support for > non-JVM languages that consume it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)