Github user dalaro commented on the issue:
https://github.com/apache/incubator-tinkerpop/pull/325
These questions involved a lot of back-and-forth that would have been slow
to resolve on github comments, so and I discussed the failures and the code on
a hangout earlier today.
Summary:
* Marko created
https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP
repo. This branch is my PR plus additional tweaks Marko and I worked out on
top. His branch supersedes this PR branch now. This PR branch **should not be
merged because Marko's branch subsumes/extends it**.
* I have a rough idea of what changed as we discussed code on the hangout,
but I will go through his branch's new commits and review.
* The new serialization code is lower-priority than the old code. It has
to be turned on through explicit user action, like this is the test provider
code:
```
System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY,
UnshadedKryoShimService.class.getCanonicalName());
config.put("spark.serializer",
KryoSerializer.class.getCanonicalName());
config.put("spark.kryo.registrator",
GryoRegistrator.class.getCanonicalName());
```
* Marko setup SparkHadoopGraphProvider to randomly choose either the old
serialization code or the new stuff in
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91.
IMHO that should be overrideable to make it deterministic, but it's not a
huge deal since it's limited to test code.
* We replaced `GryoClassResolver` with a series of class registrations in
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207.
The idea here is to explicitly configure which types need a detaching
serializer rather than relying on `GryoClassResolver` to automagically do it
regardless of explicit class registrations. Three factors influence this
choice:
* Spark's KryoSerializer does not support a user-provided ClassResolver,
and working around that limitation entails either getting a change through
Spark upstream or copying and pasting a bunch of KryoSerializer's
private/hardcoded class registrations, some of which could change in future
Spark versions. The latter is ugly and would make the code brittle across
Spark upgrades.
* `GryoClassResolver` is convenient but somewhat mysterious to third
parties, who are likelier to have seen a custom class registration mechanism
(like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom
resolver. It also removes vendor control in that its detachment behavior is
not configurable, so modifying it requires subclassing or reimplementation. For
instance, if a vendor had a lightweight Vertex implementation with a custom
serializer and wanted to bypass TP's resolver-driven detachment, then I don't
think it would be possible without modifying TP's resolver.
* `GryoClassResolver` is written against a bunch of shaded types and
relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one
version too late for compatibility with Spark's Kryo 2.21. This is the least
important concern of the three, more an implementation detail.
* Marko is running the full integration test suite now. He ran the spark
ITs just before pushing and reported success on those specifically.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---