Github user dalaro commented on the issue:
https://github.com/apache/incubator-tinkerpop/pull/325
I just pushed some changes that I hacked together this weekend. The key
additions are:
* `TinkerPopKryoRegistrator`, which I extracted from my app, and which acts
as a `spark.kryo.registrator` impl that knows about TinkerPop types
* `IoRegistryAwareKryoSerializer`, which is a Spark `Serializer` that looks
for `GryoPool.CONFIG_IO_REGISTRY` and applies it if present
* `KryoShimLoaderService.applyConfiguration(cfg)`, which replaces direct
calls to `HadoopPools.initialize(cfg)` and adds equivalent functionality for
initializing the unshaded Kryo serializer pool
The user would theoretically just set
```
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.IoRegistryAwareKryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.TinkerPopKryoRegistrator
# Optional, only needed for custom types
gremlin.io.registry=whatever.user.IoRegistryImpl
```
In practice, when I have a custom gremlin.io.registry, I have always had to
take the additional step (long before this PR) of forcibly initializing
`HadoopPools` before touching SparkGraphComputer in my app, or else some part
of Spark -- I think the closure serializer -- would attempt to use HadooPools
via ObjectWritable/VertexWritable before initialization and produce garbage on
my custom classes. **This problem predates my PR**. I'm not trying to solve it
here, in part because I still don't know if it's a pathology specific to my app
or because TinkerPop is missing a crucial `HadoopPools.initialize` (now,
equivalently, `KryoShimLoaderService.applyConfiguration`) call somewhere, and
in part because HadoopPools is such a hideous architectural wart that the
ultimate solution probably involves destroying it.
In the past, I've worked around this by defining a custom spark.serializer
that delegates newKryo() to a GryoSerializer/IoRegistryAwareSerializer, but
which has a constructor that invokes
`HadoopPools.initialize`/`KryoShimLoaderService.applyConfiguration` (relying on
that method's idempotence).
Again, this initialization step just be specific to my app and unnecessary
for the average TinkerPop user. It's possible that the config I pasted above
will work for others.
FWIW, this passes, so the overrides bug should be fixed along with all this
refactoring stuff:
```
mvn clean install -DskipTests=true && mvn verify -pl gremlin-server
-DskipIntegrationTests=false -Dtest.single=GremlinResultSetIntegrateTest
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---