GitHub user newkek opened a pull request:
https://github.com/apache/tinkerpop/pull/351
TINKERPOP-1274: GraphSON 2.0.
https://issues.apache.org/jira/browse/TINKERPOP-1274
# Summary of the changes :
Implementation of a format for value types serialization which is uniform
and not Java centric. As a reminder the new format is as follows :
- A value not typed : `value`
- A value typed : `[{"@class":"typeName"}, value]`
The content of `value` can be either a simple value, or a more complex
structure. The type prefix allows to call the right _Deserializer_ to
deserialize the `value` whatever its content is.
The default's GraphSON's 2.0 format will include types for every `value`
that is not a JSON native type
(`String/int/double/null/boolean/Map/Collection`) - called `PARTIAL_TYPES`.
This allows to significantly reduce the size of the JSON payload where those
aren't needed by the Jackson library.
GraphSON serialization _without_ types is not affected. To enable it, need
to use `NO_TYPES`.
## Quick walkthrough for the review
There are new components involved in the ser/de process that are extended
from the Jackson library :
- _TypeIdResolver_ : performs the conversion of a object's `Class` ->
`typeID`, and from a `typeID` -> `Class`.
- _TypeResolverBuilder_ : creates a _TypeIdResolver_ and instantiates a
_TypeDeserializer_ or _TypeSerializer_ with the _TypeIdResolver_ as a param.
- _TypeSerializer_ : writes the prefix and suffix for a typeID. The
_TypeSerializer_ is provided to the _Serializers_ and handles the type
serialization that respects the format put in place.
- _TypeDeserializer_ : is called before the deserializers are called. Its
role is to detect a type, and call the right _Deserializer_ to deserialize the
value.
Most of the serializers already existing for Graph had to be modified,
because most of them were *manually hardcoding* types without calling the
_TypeSerializer_ and hence, those were not respecting the format. Now all
Serializers respect the format because they call the _TypeSerializer_ given in
parameter. I followed the frame put in place by @spmallette to implement those
new serializers (prefixed with `V2d0`) without breaking existing clients code.
In _TypeDeserializer_, `baseType` represents the class given in parameter
by the user for the ser/de. If the user calls
`mapper.readValueAsString(jsonString, UUID.class)` the `baseType` in
_TypeDeserializer_ will be a _JavaType_ (which is the Jackson's custom class
for Java classes) that represents a _UUID_ class. The *wildcard* in our
deserialization mechanism is `Object.class`.
The deserialization path is as follow :
- _TypeDeserializer_ is called only for non simple JSON values (non
`String/int/double/null/boolean`).
- When called, if a type is detected, the _TypeDeserializer_ will read the
typeID, convert it to a JavaType (thanks to the _TypeIdResolver_), check that
the `baseType` is the same than what was read in the payload (only if
`baseType` is not the *wildcard*), and call the `deserialize()` method of the
_JsonDeserializer_ registered for that type.
- If a type is not detected, detect Maps or Arrays and call appropriate
Deserializer.
## Some results
GraphSON 2.0 shows a significant reduction of the payload's size for typed
serialization. And the consequence in performance is that since there's less to
process, the ser/de is faster. Results show a reduction of at least 50% in the
payload's size and evolving linearly (the bigger the payload the bigger the
difference) :
```
â ls -lh
tinkergraph-gremlin/target/test-case-data/TinkerGraphTest/tinkerpop-io/
-rw-r--r-- 1 kevingallardo staff 890K 28 Jun 21:50
grateful-dead-V2d0-typed.json
-rw-r--r-- 1 kevingallardo staff 1.9M 28 Jun 21:50
grateful-dead-typed.json
-rw-r--r-- 1 kevingallardo staff 851K 28 Jun 21:50 grateful-dead.json
-rw-r--r-- 1 kevingallardo staff 1.5K 28 Jun 21:50
tinkerpop-classic-V2d0-typed.json
-rw-r--r-- 1 kevingallardo staff 3.6K 28 Jun 21:50
tinkerpop-classic-typed.json
-rw-r--r-- 1 kevingallardo staff 1.3K 28 Jun 21:50
tinkerpop-classic.json
```
## Tests
Tests cover the same functionalities covered by GraphSON 1.0 typed, plus
additional features brought by GraphSON 2.0.
## Tradeoffs
- Some tricks were implemented in order to provide the types without
packages names. Since Java does not provide a way to search a class by its
simple name, the `TypeIdResolver` has to have an index that it can refer to,
and that has been correctly filled. The _TinkerpopJackosnModule_ class will
handle providing those new indexes to the _TypeIdResolver_ when custom
_Deserializers_ are added, but _without_ doing that we have no way to properly
convert a type. We then require that a user instead of extending Jackson's
_SimpleModule_ when writing a module, extends _TinkerpopJacksonModule_ which is
an extension the Jackson _SimpleModule_. We can discuss whether this is a
showstopper or if it is acceptable. Some solutions to prevent that, could be
using a external library that allows searching a Class by its simple name, or
switch to writing typeIDs with the full Class's canonical name. I don't have a
strong opinion but it seems like the current solution is simple and good enough.
- We loose some polymorphism potential in regards to POJOs without
deserializers. Since the approach here is simpler and faster, it doesn't
inspects JavaTypes to find potential subclasses. On the filp side it allows
fine tuning of the Deserializers chosen to deserialize a type. I also thought
about making the _TypeIdResolver_ exposed to users through the GraphSONMapper
for full customization of the types ser/de. Could be done easily later.
- There is 1 situation that can lead to unexpected results : if somebody
serializes a `value` that has the exact same format as the type format. I.e. a
List in which the _first_ element is a Map in which the _first_ entry's key is
GraphSONTokens.CLASS. We may want to highlight that somewhere.
## Extra facts
- The class _JsonParserConcat_ allows some great perf improvements in the
deser path and has been implemented and inspired by the _JsonParserSequence_
class from Jackson, however the Jackson's class has an unexpected behaviour,
which is corrected in _JsonParserConcat_. _JsonParserSequence_ may be corrected
in future Jackson versions but in the meantime _JsonParserConcat_ will do fine.
(http://jackson-users.ning.com/forum/topics/jsonparsersequence-behaviour-seems-misleading)
- `FULL_TYPES` has not been implemented. FULL_TYPES means writing types for
JSON natively supported values. However all the mechanism is put in place to
implement it easily if we deem it necessary. I don't see the necessity for this
now.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/newkek/incubator-tinkerpop TINKERPOP-1274
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tinkerpop/pull/351.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #351
----
commit f5b64fafb071fffc60fdb5113709c78f5dde334f
Author: Stephen Mallette <[email protected]>
Date: 2016-05-18T12:41:26Z
Frame up for GraphSON 2.0
commit 0b805d600b9b11b0b36bb34ebfe4c880615e7764
Author: Kevin Gallardo <[email protected]>
Date: 2016-06-28T18:38:32Z
TINKERPOP-1274: GraphSON 2.0.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---