[
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316514#comment-15316514
]
ASF GitHub Bot commented on TINKERPOP-1321:
-------------------------------------------
Github user okram commented on the issue:
https://github.com/apache/incubator-tinkerpop/pull/325
Testing this morning, I get two test failures.
```
Tests in error:
GroovyProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117
» IllegalState
ProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117
» IllegalState
```
Both these tests make use of a user defined `VertexProgram` that stores
vertices as `HALTED_TRAVERSERS`. The problem is that the vertices are not being
detached.
```
Caused by: java.io.NotSerializableException:
org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex
Serialization stack:
- object not serializable (class:
org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex,
value: v[3])
- writeObject data (class: java.util.HashMap)
- object (class java.util.HashMap, {v[3]=3, v[5]=1})
- field (class:
org.apache.tinkerpop.gremlin.process.traversal.traverser.util.AbstractTraverser,
name: t, type: class java.lang.Object)
- object (class
org.apache.tinkerpop.gremlin.process.traversal.traverser.B_LP_O_S_SE_SL_Traverser,
{v[3]=3, v[5]=1})
```
In the past, this would be automatically detached, but now it doesn't seem
to be. I'll fiddle a bit to sew whats going on, but just heads up in case its
obvious to use @dalaro ...
> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> ---------------------------------------------------------------------
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
> Issue Type: Improvement
> Components: io
> Affects Versions: 3.2.0-incubating
> Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting
> spark.serializer=GryoSerializer. This makes GryoSerializer responsible for
> serializing not just TinkerPop types but also scala runtime types and Spark
> internals. GryoSerializer doesn't extend either of the two serializers
> provided by Spark. It effectively assumes responsibility for reimplementing
> them.
> This is problematic. It is not totally trivial to replicate the
> functionality of Spark's standard serializers. It is also not easy to
> empirically test all meaningful cases. For instance, there is a conditional
> within Spark that selects between two concrete Map implementations depending
> on whether the current RDD partition count exceeds 2k
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
> The implementation used below this threshold serializes fine on
> GryoSerializer. The implementation used above the threshold does not. Above
> the partition threshold, I've started getting
> {{org.apache.spark.SparkException: Job aborted due to stage failure:
> Exception while getting task result:
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed
> to find one of the right cookies.}} Google leads to
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just
> switching to Spark's {{KryoSerializer}} without changing anything somehow
> fixes the problem in my environment, implying that Spark has done something
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.
> These still serialize, but KryoSerializer uses its default behavior
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's
> StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just
> by tweaking the configuration. Besides spark.serializer, there is also a
> setting spark.closure.serializer for which the only supported value is
> JavaSerializer. Key TP classes that make it into the object reference graphs
> of Spark closures implement Serializable by resorting to TinkerPop's shaded
> Kryo via HadoopPools (looking at Object/VertexWritable). This leads to
> surprises with custom property data types. It doesn't matter if those types
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is
> configured to accept those types. If those types are reachable from
> Object/VertexWritable, then they must be registered with TinkerPop's internal
> shaded Kryo, or else it will choke on them (unless it was explicitly
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if
> they choose not to use GryoSerializer: introduce lightweight interfaces that
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded
> package doing the work. TinkerPop's serialization logic would be written
> against interfaces that replicate a minimal subset of Kryo, and then TP's
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without
> having to touch the source, recompile any TinkerPop code, or munge bytecode
> at runtime.
> This would not resolve all of the potential problems/complexity around
> TinkerPop serialization, but it would make it possible for users to apply the
> spark.serializer of their choice, switching off GryoSerializer if they so
> choose. Users could also continue to use GyroSerializer as they have until
> now.
> I've already run this through a few iterations locally and have an
> abstraction that allows me to run with spark.serializer=KryoSerializer,
> register GryoMapper/GryoSerializer's standard set of types, reuse TinkerPop's
> StarGraph serializer, and bypass GryoSerializer in HadoopPools to boot. I
> just rebased it on current master. I'll submit a PR for discussion/review.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)