[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

dalaro Mon, 06 Jun 2016 14:27:57 -0700

Github user dalaro commented on the issue:

    https://github.com/apache/incubator-tinkerpop/pull/325
  
    These questions involved a lot of back-and-forth that would have been slow 
to resolve on github comments, so and I discussed the failures and the code on 
a hangout earlier today.
    
    Summary:
    
    * Marko created 
https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP 
repo.  This branch is my PR plus additional tweaks Marko and I worked out on 
top.  His branch supersedes this PR branch now.  This PR branch **should not be 
merged because Marko's branch subsumes/extends it**.
    * I have a rough idea of what changed as we discussed code on the hangout, 
but I will go through his branch's new commits and review.
    * The new serialization code is lower-priority than the old code.  It has 
to be turned on through explicit user action, like this is the test provider 
code:
    ```
                System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY, 
UnshadedKryoShimService.class.getCanonicalName());
                config.put("spark.serializer", 
KryoSerializer.class.getCanonicalName());
                config.put("spark.kryo.registrator", 
GryoRegistrator.class.getCanonicalName());
    ```
    * Marko setup SparkHadoopGraphProvider to randomly choose either the old 
serialization code or the new stuff in 
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91.
  IMHO that should be overrideable to make it deterministic, but it's not a 
huge deal since it's limited to test code.
    * We replaced `GryoClassResolver` with a series of class registrations in 
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207.
  The idea here is to explicitly configure which types need a detaching 
serializer rather than relying on `GryoClassResolver` to automagically do it 
regardless of explicit class registrations.  Three factors influence this 
choice:
       * Spark's KryoSerializer does not support a user-provided ClassResolver, 
and working around that limitation entails either getting a change through 
Spark upstream or copying and pasting a bunch of KryoSerializer's 
private/hardcoded class registrations, some of which could change in future 
Spark versions.  The latter is ugly and would make the code brittle across 
Spark upgrades.
       * `GryoClassResolver` is convenient but somewhat mysterious to third 
parties, who are likelier to have seen a custom class registration mechanism 
(like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom 
resolver.  It also removes vendor control in that its detachment behavior is 
not configurable, so modifying it requires subclassing or reimplementation. For 
instance, if a vendor had a lightweight Vertex implementation with a custom 
serializer and wanted to bypass TP's resolver-driven detachment, then I don't 
think it would be possible without modifying TP's resolver.
       * `GryoClassResolver` is written against a bunch of shaded types and 
relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one 
version too late for compatibility with Spark's Kryo 2.21.  This is the least 
important concern of the three, more an implementation detail.
    * Marko is running the full integration test suite now.  He ran the spark 
ITs just before pushing and reported success on those specifically.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

Reply via email to