[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317589#comment-15317589 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Just a reminder: we are voting on branch https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321. That subsumes the PR and adds a combination of fixes from Marko and me. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/complexity around > TinkerPop serialization, but it would make it possible for users to apply the > spark.serializer of their choice, switching off GryoSerializer if they so > choose. Users could also c
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Just a reminder: we are voting on branch https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321. That subsumes the PR and adds a combination of fixes from Marko and me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317578#comment-15317578 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 VOTE +1. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/complexity around > TinkerPop serialization, but it would make it possible for users to apply the > spark.serializer of their choice, switching off GryoSerializer if they so > choose. Users could also continue to use GyroSerializer as they have until > now. > I've already run this through a few iterations locally and have an > abstraction that allows me to run with spark.s
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 VOTE +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317312#comment-15317312 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user spmallette commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Yeah - i know - it's there, but it's proliferating and becoming more and more OK which i think is "bad". i'll bring it up as a separate issue. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/complexity around > TinkerPop serialization, but it would make it possible for users to apply the > spark.serializer of their choice, switching off GryoSerializer if they so > choose. Users could also continue to use GyroSerializer as t
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user spmallette commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Yeah - i know - it's there, but it's proliferating and becoming more and more OK which i think is "bad". i'll bring it up as a separate issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317304#comment-15317304 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 We use `Random` in numerous places to test various "versions" of a particular code block. It is possible to replace all possible choices with new `GraphProviders` but you are talking about the test suite taking on the order of 20+ hours to complete then. If you have a better idea for testing these facets, then please provide a PR. Note that this PR's use of `Random` is not something "novel" as we use `Random` already in `SparkHadoopGraphProvider` to select `InputFormats`, `OutputFormats`, and traversal sources (deprecated vs. non-deprecated representations). > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 We use `Random` in numerous places to test various "versions" of a particular code block. It is possible to replace all possible choices with new `GraphProviders` but you are talking about the test suite taking on the order of 20+ hours to complete then. If you have a better idea for testing these facets, then please provide a PR. Note that this PR's use of `Random` is not something "novel" as we use `Random` already in `SparkHadoopGraphProvider` to select `InputFormats`, `OutputFormats`, and traversal sources (deprecated vs. non-deprecated representations). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317291#comment-15317291 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user spmallette commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 > Marko setup SparkHadoopGraphProvider to randomly choose either the old serialization code or the new stuff in e700363#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be overrideable to make it deterministic, but it's not a huge deal since it's limited to test code. +1 - i'd prefer to not have "random" in our test cases. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/complexity around > TinkerPop seria
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user spmallette commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 > Marko setup SparkHadoopGraphProvider to randomly choose either the old serialization code or the new stuff in e700363#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be overrideable to make it deterministic, but it's not a huge deal since it's limited to test code. +1 - i'd prefer to not have "random" in our test cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317279#comment-15317279 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 These questions involved a lot of back-and-forth that would have been slow to resolve on github comments, so and I discussed the failures and the code on a hangout earlier today. Summary: * Marko created https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP repo. This branch is my PR plus additional tweaks Marko and I worked out on top. His branch supersedes this PR branch now. This PR branch **should not be merged because Marko's branch subsumes/extends it**. * I have a rough idea of what changed as we discussed code on the hangout, but I will go through his branch's new commits and review. * The new serialization code is lower-priority than the old code. It has to be turned on through explicit user action, like this is the test provider code: ``` System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY, UnshadedKryoShimService.class.getCanonicalName()); config.put("spark.serializer", KryoSerializer.class.getCanonicalName()); config.put("spark.kryo.registrator", GryoRegistrator.class.getCanonicalName()); ``` * Marko setup SparkHadoopGraphProvider to randomly choose either the old serialization code or the new stuff in https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be overrideable to make it deterministic, but it's not a huge deal since it's limited to test code. * We replaced `GryoClassResolver` with a series of class registrations in https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207. The idea here is to explicitly configure which types need a detaching serializer rather than relying on `GryoClassResolver` to automagically do it regardless of explicit class registrations. Three factors influence this choice: * Spark's KryoSerializer does not support a user-provided ClassResolver, and working around that limitation entails either getting a change through Spark upstream or copying and pasting a bunch of KryoSerializer's private/hardcoded class registrations, some of which could change in future Spark versions. The latter is ugly and would make the code brittle across Spark upgrades. * `GryoClassResolver` is convenient but somewhat mysterious to third parties, who are likelier to have seen a custom class registration mechanism (like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom resolver. It also removes vendor control in that its detachment behavior is not configurable, so modifying it requires subclassing or reimplementation. For instance, if a vendor had a lightweight Vertex implementation with a custom serializer and wanted to bypass TP's resolver-driven detachment, then I don't think it would be possible without modifying TP's resolver. * `GryoClassResolver` is written against a bunch of shaded types and relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one version too late for compatibility with Spark's Kryo 2.21. This is the least important concern of the three, more an implementation detail. * Marko is running the full integration test suite now. He ran the spark ITs just before pushing and reported success on those specifically. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https:
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 These questions involved a lot of back-and-forth that would have been slow to resolve on github comments, so and I discussed the failures and the code on a hangout earlier today. Summary: * Marko created https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP repo. This branch is my PR plus additional tweaks Marko and I worked out on top. His branch supersedes this PR branch now. This PR branch **should not be merged because Marko's branch subsumes/extends it**. * I have a rough idea of what changed as we discussed code on the hangout, but I will go through his branch's new commits and review. * The new serialization code is lower-priority than the old code. It has to be turned on through explicit user action, like this is the test provider code: ``` System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY, UnshadedKryoShimService.class.getCanonicalName()); config.put("spark.serializer", KryoSerializer.class.getCanonicalName()); config.put("spark.kryo.registrator", GryoRegistrator.class.getCanonicalName()); ``` * Marko setup SparkHadoopGraphProvider to randomly choose either the old serialization code or the new stuff in https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be overrideable to make it deterministic, but it's not a huge deal since it's limited to test code. * We replaced `GryoClassResolver` with a series of class registrations in https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207. The idea here is to explicitly configure which types need a detaching serializer rather than relying on `GryoClassResolver` to automagically do it regardless of explicit class registrations. Three factors influence this choice: * Spark's KryoSerializer does not support a user-provided ClassResolver, and working around that limitation entails either getting a change through Spark upstream or copying and pasting a bunch of KryoSerializer's private/hardcoded class registrations, some of which could change in future Spark versions. The latter is ugly and would make the code brittle across Spark upgrades. * `GryoClassResolver` is convenient but somewhat mysterious to third parties, who are likelier to have seen a custom class registration mechanism (like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom resolver. It also removes vendor control in that its detachment behavior is not configurable, so modifying it requires subclassing or reimplementation. For instance, if a vendor had a lightweight Vertex implementation with a custom serializer and wanted to bypass TP's resolver-driven detachment, then I don't think it would be possible without modifying TP's resolver. * `GryoClassResolver` is written against a bunch of shaded types and relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one version too late for compatibility with Spark's Kryo 2.21. This is the least important concern of the three, more an implementation detail. * Marko is running the full integration test suite now. He ran the spark ITs just before pushing and reported success on those specifically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316836#comment-15316836 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 So, `mvn clean install` is happy w/o the `TravesralInterruptionComputerTest` (commented out). Running full integration tests now... > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/complexity around > TinkerPop serialization, but it would make it possible for users to apply the > spark.serializer of their choice, switching off GryoSerializer if they so > choose. Users could also continue to use GyroSerializer as they have until
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 So, `mvn clean install` is happy w/o the `TravesralInterruptionComputerTest` (commented out). Running full integration tests now... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316810#comment-15316810 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Okay. So 1 and 2 above are done, however, I had to `OPT_OUT` of `TraversalInterruptionComputerTest` as for some reason, the `GryoPoolKryoShimService` is not respecting interruption exceptions. On the plus side, serialization is completely handled by the shim service which ensures consistency between computer and `ComputerResult` data. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/complexity around > TinkerPop serialization, but
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Okay. So 1 and 2 above are done, however, I had to `OPT_OUT` of `TraversalInterruptionComputerTest` as for some reason, the `GryoPoolKryoShimService` is not respecting interruption exceptions. On the plus side, serialization is completely handled by the shim service which ensures consistency between computer and `ComputerResult` data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 I think some things need to be fixed up. 1. `TinkerPopKryoRegistrator` should be called `GryoRegistrator`. 2. I think we need a `GryoPoolKryoShimService` in core that non-Spark/Hadoop projects can leverage. ** We need to get `VertexProgramHelper.serializer/deserializer()` to use `KryoShimService`. I'm working on 2 right now and things are moving forward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316622#comment-15316622 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 I think some things need to be fixed up. 1. `TinkerPopKryoRegistrator` should be called `GryoRegistrator`. 2. I think we need a `GryoPoolKryoShimService` in core that non-Spark/Hadoop projects can leverage. ** We need to get `VertexProgramHelper.serializer/deserializer()` to use `KryoShimService`. I'm working on 2 right now and things are moving forward. > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't matter if those types > implement Serializable, and it doesn't matter if Spark's KryoSerializer is > configured to accept those types. If those types are reachable from > Object/VertexWritable, then they must be registered with TinkerPop's internal > shaded Kryo, or else it will choke on them (unless it was explicitly > configured to allow unregistered classes). > I suggest the following change to give users more flexibility in their choice > of spark.serializer, and to allow them to reuse TinkerPop's serializers if > they choose not to use GryoSerializer: introduce lightweight interfaces that > decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded > package doing the work. TinkerPop's serialization logic would be written > against interfaces that replicate a minimal subset of Kryo, and then TP's > shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without > having to touch the source, recompile any TinkerPop code, or munge bytecode > at runtime. > This would not resolve all of the potential problems/compl
Re: [DISCUSS] ASF Board Draft Report - June 2016
This looks good to me. Thanks, Ted On Sat, Jun 4, 2016, 11:54 AM Stephen Mallette wrote: > As TinkerPop has graduated from incubator, we no longer need to produce > incubator reports. Instead, TinkerPop needs to now provide reports to the > ASF Board. Ultimately, as PMC chair it is my responsibility to submit the > report when required, but I liked our method of discussing the incubator > report prior to submission so I'd like to continue that. The content is > largely the same - you can read more about what's required here: > > http://www.apache.org/foundation/board/reporting > > and see examples of other project's reports here: > > > https://svn.apache.org/repos/private/foundation/board/board_agenda_2016_05_18.txt > > Anyway, here's a draft of the June report that I've prepared - please feel > free to offer any feedback: > > > ## Description: > Apache TinkerPop is a graph computing framework for both graph databases > (OLTP) and graph analytic systems (OLAP). > > This is TinkerPop's first month outside of incubation. We are still > awaiting resource transfer changes related to graduation to take place as > there are issues preventing the infrastructure team from completing that > process. > > Project development has been building up to two new releases > which are expected to be released in July. A key area of focus in this > last month has been related to opening TinkerPop to non-JVM programming > languages through the concept of Gremlin Language Variants[1]. In taking > this direction, TinkerPop becomes more accessible to non-JVM developer > communities. > > ## Issues: > There are no issues requiring board attention at this time. > > ## Releases: > - 3.1.2 (April 8, 2016) > - 3.2.0 (April 8, 2016) > > ## PMC/Committer: > > - Last PMC addition was Dylan Millikin - May 2016 > - Last committer addition was Michael Pollmeier - April 2016 > > [1] > > http://tinkerpop.apache.org/docs/3.2.1-SNAPSHOT/tutorials/gremlin-language-variants/ >
[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo
[ https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316514#comment-15316514 ] ASF GitHub Bot commented on TINKERPOP-1321: --- Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Testing this morning, I get two test failures. ``` Tests in error: GroovyProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117 » IllegalState ProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117 » IllegalState ``` Both these tests make use of a user defined `VertexProgram` that stores vertices as `HALTED_TRAVERSERS`. The problem is that the vertices are not being detached. ``` Caused by: java.io.NotSerializableException: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex Serialization stack: - object not serializable (class: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex, value: v[3]) - writeObject data (class: java.util.HashMap) - object (class java.util.HashMap, {v[3]=3, v[5]=1}) - field (class: org.apache.tinkerpop.gremlin.process.traversal.traverser.util.AbstractTraverser, name: t, type: class java.lang.Object) - object (class org.apache.tinkerpop.gremlin.process.traversal.traverser.B_LP_O_S_SE_SL_Traverser, {v[3]=3, v[5]=1}) ``` In the past, this would be automatically detached, but now it doesn't seem to be. I'll fiddle a bit to sew whats going on, but just heads up in case its obvious to use @dalaro ... > Loosen coupling between TinkerPop serialization logic and shaded Kryo > - > > Key: TINKERPOP-1321 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1321 > Project: TinkerPop > Issue Type: Improvement > Components: io >Affects Versions: 3.2.0-incubating >Reporter: Dan LaRocque > Fix For: 3.2.1 > > > When running jobs on Spark, TinkerPop currently recommends setting > spark.serializer=GryoSerializer. This makes GryoSerializer responsible for > serializing not just TinkerPop types but also scala runtime types and Spark > internals. GryoSerializer doesn't extend either of the two serializers > provided by Spark. It effectively assumes responsibility for reimplementing > them. > This is problematic. It is not totally trivial to replicate the > functionality of Spark's standard serializers. It is also not easy to > empirically test all meaningful cases. For instance, there is a conditional > within Spark that selects between two concrete Map implementations depending > on whether the current RDD partition count exceeds 2k > (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53). > The implementation used below this threshold serializes fine on > GryoSerializer. The implementation used above the threshold does not. Above > the partition threshold, I've started getting > {{org.apache.spark.SparkException: Job aborted due to stage failure: > Exception while getting task result: > org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed > to find one of the right cookies.}} Google leads to > https://github.com/RoaringBitmap/RoaringBitmap/issues/64. However, just > switching to Spark's {{KryoSerializer}} without changing anything somehow > fixes the problem in my environment, implying that Spark has done something > to address this problem that may not be fully replicated in TinkerPop. > However, "just switching to Spark's KryoSerializer" is not a great approach. > For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph > serializer, and Spark traversals can produce a lot of little ego-StarGraphs. > These still serialize, but KryoSerializer uses its default behavior > (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's > StarGraphSerializer. TinkerPop's reliance on its own internal shaded Kryo > means that its serializers cannot be registered with Spark's unshaded Kryo. > More concerning, it's impossible to completely switch to KryoSerializer just > by tweaking the configuration. Besides spark.serializer, there is also a > setting spark.closure.serializer for which the only supported value is > JavaSerializer. Key TP classes that make it into the object reference graphs > of Spark closures implement Serializable by resorting to TinkerPop's shaded > Kryo via HadoopPools (looking at Object/VertexWritable). This leads to > surprises with custom property data types. It doesn't
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user okram commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 Testing this morning, I get two test failures. ``` Tests in error: GroovyProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117 » IllegalState ProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117 » IllegalState ``` Both these tests make use of a user defined `VertexProgram` that stores vertices as `HALTED_TRAVERSERS`. The problem is that the vertices are not being detached. ``` Caused by: java.io.NotSerializableException: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex Serialization stack: - object not serializable (class: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex, value: v[3]) - writeObject data (class: java.util.HashMap) - object (class java.util.HashMap, {v[3]=3, v[5]=1}) - field (class: org.apache.tinkerpop.gremlin.process.traversal.traverser.util.AbstractTraverser, name: t, type: class java.lang.Object) - object (class org.apache.tinkerpop.gremlin.process.traversal.traverser.B_LP_O_S_SE_SL_Traverser, {v[3]=3, v[5]=1}) ``` In the past, this would be automatically detached, but now it doesn't seem to be. I'll fiddle a bit to sew whats going on, but just heads up in case its obvious to use @dalaro ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] Thinking on Release
Yes, end of June-ish is best for me as I have few things on my plate the first half of this month. Thanks, Marko. http://markorodriguez.com > On Jun 6, 2016, at 4:46 AM, Stephen Mallette wrote: > > We didn't really discuss a date for release on this thread. I was thinking > that we could start looking at the week of July 4th as the target week for > VOTE and nail down a date as we get closer. > > On Wed, May 25, 2016 at 2:42 PM, Jason Plurad wrote: > >> I'd think from a TinkerPop branding perspective, it probably helps to have >> the name in there. It's Apache TinkerPop, not Apache Gremlin. >> >> I just took a quick look on a mirror, and some other Apache projects >> (Spark, Kafka, HBase, NiFi, Pig, Zookeeper) don't even include apache in >> their distributables, so maybe we can just do: >> >> tinkerpop-gremlin-console-x.y.z.zip >> tinkerpop-gremlin-server-x.y.z.zip >> >> >> On Wed, May 25, 2016 at 11:03 AM Stephen Mallette >> wrote: >> >>> jason i think that was a suggestion to conform more to standard apache >>> releases from someone in incubator. if it was mandatory we would have >>> burned for that too many times to count at this point. i'm good to change >>> it if everyone else is. what do we want them to be? >>> >>> apache-tinkerpop-console-x.y.z.zip >>> apache-tinkerpop-server-x.y.z.zip >>> >>> or the full business: >>> >>> apache-tinkerpop-gremlin-console-x.y.z.zip >>> apache-tinkerpop-gremlin-server-x.y.z.zip >>> >>> i guess we lost "-incubating" now so the latter doesn't look so bad to me >>> anymore. >>> >>> >>> On Wed, May 25, 2016 at 10:56 AM, Marko Rodriguez >>> wrote: >>> Hi, Yes, an imminent release is good. There are 2 severe bug fixes in >> master/ (3.2.1) that I would like to get out there. 3.2.0 had lots of internal changes to OLAP and I paid the price by incurring bugs. :| > Somebody had mentioned that our distributables are supposed to be >> named > apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's > something that should be done along with this release. There is really no such thing as "tinkerpop" besides the source code >>> which is distributed as apache-tinkerpop-*.zip. The two other distributions >> are gremlin-console and gremlin-server and I think we should keep those >>> naming conventions as they are so they reflect what is being distributed. >> Thus, >>> I think the naming of our artifacts is correct. Thanks, Marko. http://markorodriguez.com On May 25, 2016, at 8:38 AM, Jason Plurad wrote: > Somebody had mentioned that our distributables are supposed to be >> named > apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's > something that should be done along with this release. > > -- Jason > > On Wed, May 25, 2016 at 9:55 AM, Stephen Mallette < >>> spmalle...@gmail.com> > wrote: > >> cool, Ted. it would be good to have another hand there. >> >> On Wed, May 25, 2016 at 9:13 AM, Ted Wilmes >>> wrote: >> >>> I think a release sounds good. I'd be interested in witnessing the >>> the >>> post-PMC vote release steps so that I might be able to help out on >> an >>> upcoming release. >>> >>> --Ted >>> >>> On Wed, May 25, 2016 at 5:35 AM, Marvin Froeder >> >> wrote: >>> Your are right, for some reason I though it was on the artifactId >> as >> well On Wed, May 25, 2016 at 10:22 PM, Stephen Mallette < >> spmalle...@gmail.com wrote: > I don't think we need to relocate anything. The "-incubating" is >>> just >>> in > the version name, so we will just remove it for future releases. > > On Wed, May 25, 2016 at 4:55 AM, Jean-Baptiste Musso < >>> jbmu...@gmail.com> > wrote: > >> I think this is a good idea. This could make these releases look >> more >> "stable": I've often felt that the -incubating suffix somehow >> made >> releases look "alpha-ish" / "beta-ish", even though they were >> not. >> >> Naming aside, bug fixes never hurt. >> >> Jean-Baptiste >> >> On Wed, May 25, 2016 at 12:49 AM, Stephen Mallette < spmalle...@gmail.com >> >> wrote: >>> We've seen a lot of good fixes/optimizations to 3.1.3 and 3.2.1 >>> and I >>> wonder if we shouldn't exercise our new found TLP powers to do >> a > release >>> and get rid of the "-incubating" at the end of our "current" >> distributions >>> and artifacts. thoughts? >> > >>> >> >>> >>
Re: [DISCUSS] Thinking on Release
We didn't really discuss a date for release on this thread. I was thinking that we could start looking at the week of July 4th as the target week for VOTE and nail down a date as we get closer. On Wed, May 25, 2016 at 2:42 PM, Jason Plurad wrote: > I'd think from a TinkerPop branding perspective, it probably helps to have > the name in there. It's Apache TinkerPop, not Apache Gremlin. > > I just took a quick look on a mirror, and some other Apache projects > (Spark, Kafka, HBase, NiFi, Pig, Zookeeper) don't even include apache in > their distributables, so maybe we can just do: > > tinkerpop-gremlin-console-x.y.z.zip > tinkerpop-gremlin-server-x.y.z.zip > > > On Wed, May 25, 2016 at 11:03 AM Stephen Mallette > wrote: > > > jason i think that was a suggestion to conform more to standard apache > > releases from someone in incubator. if it was mandatory we would have > > burned for that too many times to count at this point. i'm good to change > > it if everyone else is. what do we want them to be? > > > > apache-tinkerpop-console-x.y.z.zip > > apache-tinkerpop-server-x.y.z.zip > > > > or the full business: > > > > apache-tinkerpop-gremlin-console-x.y.z.zip > > apache-tinkerpop-gremlin-server-x.y.z.zip > > > > i guess we lost "-incubating" now so the latter doesn't look so bad to me > > anymore. > > > > > > On Wed, May 25, 2016 at 10:56 AM, Marko Rodriguez > > wrote: > > > > > Hi, > > > > > > Yes, an imminent release is good. There are 2 severe bug fixes in > master/ > > > (3.2.1) that I would like to get out there. 3.2.0 had lots of internal > > > changes to OLAP and I paid the price by incurring bugs. :| > > > > > > > Somebody had mentioned that our distributables are supposed to be > named > > > > apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's > > > > something that should be done along with this release. > > > > > > There is really no such thing as "tinkerpop" besides the source code > > which > > > is distributed as apache-tinkerpop-*.zip. The two other distributions > are > > > gremlin-console and gremlin-server and I think we should keep those > > naming > > > conventions as they are so they reflect what is being distributed. > Thus, > > I > > > think the naming of our artifacts is correct. > > > > > > Thanks, > > > Marko. > > > > > > http://markorodriguez.com > > > > > > On May 25, 2016, at 8:38 AM, Jason Plurad wrote: > > > > > > > Somebody had mentioned that our distributables are supposed to be > named > > > > apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's > > > > something that should be done along with this release. > > > > > > > > -- Jason > > > > > > > > On Wed, May 25, 2016 at 9:55 AM, Stephen Mallette < > > spmalle...@gmail.com> > > > > wrote: > > > > > > > >> cool, Ted. it would be good to have another hand there. > > > >> > > > >> On Wed, May 25, 2016 at 9:13 AM, Ted Wilmes > > wrote: > > > >> > > > >>> I think a release sounds good. I'd be interested in witnessing the > > the > > > >>> post-PMC vote release steps so that I might be able to help out on > an > > > >>> upcoming release. > > > >>> > > > >>> --Ted > > > >>> > > > >>> On Wed, May 25, 2016 at 5:35 AM, Marvin Froeder > > > > >> wrote: > > > >>> > > > Your are right, for some reason I though it was on the artifactId > as > > > >> well > > > > > > On Wed, May 25, 2016 at 10:22 PM, Stephen Mallette < > > > >> spmalle...@gmail.com > > > > > > wrote: > > > > > > > I don't think we need to relocate anything. The "-incubating" is > > just > > > >>> in > > > > the version name, so we will just remove it for future releases. > > > > > > > > On Wed, May 25, 2016 at 4:55 AM, Jean-Baptiste Musso < > > > >>> jbmu...@gmail.com> > > > > wrote: > > > > > > > >> I think this is a good idea. This could make these releases look > > > >> more > > > >> "stable": I've often felt that the -incubating suffix somehow > made > > > >> releases look "alpha-ish" / "beta-ish", even though they were > not. > > > >> > > > >> Naming aside, bug fixes never hurt. > > > >> > > > >> Jean-Baptiste > > > >> > > > >> On Wed, May 25, 2016 at 12:49 AM, Stephen Mallette < > > > spmalle...@gmail.com > > > >> > > > >> wrote: > > > >>> We've seen a lot of good fixes/optimizations to 3.1.3 and 3.2.1 > > > >>> and I > > > >>> wonder if we shouldn't exercise our new found TLP powers to do > a > > > > release > > > >>> and get rid of the "-incubating" at the end of our "current" > > > >> distributions > > > >>> and artifacts. thoughts? > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > >
Re: When does incubator-suffix get dropped?
Gruno has said that there were problems in infrastructure with the incubation "upgrade" process - like stuff was broken. That was most of the hangup as I understood it. On Sat, Jun 4, 2016 at 2:40 PM, Hadrian Zbarcea wrote: > As soon as the graduation is announced. > > Homepage, mailing lists done (old ones will forward for a while). Next > release should drop the 'incubating' part. > > Hadrian > > > On 06/04/2016 02:33 PM, Marko Rodriguez wrote: > >> Hi, >> >> I’ve been hearing big talky talky that Incubator suffixes are “just >> handled” — Git, homepage, mailing lists. When does talky talky become doey >> doey? >> >> Thanks, >> Marko. >> >> http://markorodriguez.com >> >> >> >>
[GitHub] incubator-tinkerpop issue #315: Remove dashes from labels and property keys
Github user spmallette commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/315 @analytically I see that you made some updates. I think you might need to rebase against tp31 though to resolve conflicts. Other than that, is this ready for review at this point? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-tinkerpop pull request #314: TINKERPOP-1301 Provide Javadoc for Sc...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/314 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-tinkerpop issue #314: TINKERPOP-1301 Provide Javadoc for ScriptInp...
Github user spmallette commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/314 Replaced by #327 - going to close this one from our end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-tinkerpop pull request #326: TINKERPOP-1322 Added ConfigurationCus...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/326 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 I saw the `shouldHandleMonthDay` test failure on CI and pushed a fix. I carelessly broke the read leg of that serializer while doing some tedious, boilerplate-y method signature changes to port a batch a serializers to the shim to support IoRegistry unification. The one-liner commit that I just pushed fixed the problem locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...
Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 I just pushed some changes that I hacked together this weekend. The key additions are: * `TinkerPopKryoRegistrator`, which I extracted from my app, and which acts as a `spark.kryo.registrator` impl that knows about TinkerPop types * `IoRegistryAwareKryoSerializer`, which is a Spark `Serializer` that looks for `GryoPool.CONFIG_IO_REGISTRY` and applies it if present * `KryoShimLoaderService.applyConfiguration(cfg)`, which replaces direct calls to `HadoopPools.initialize(cfg)` and adds equivalent functionality for initializing the unshaded Kryo serializer pool The user would theoretically just set ``` spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.IoRegistryAwareKryoSerializer spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.TinkerPopKryoRegistrator # Optional, only needed for custom types gremlin.io.registry=whatever.user.IoRegistryImpl ``` In practice, when I have a custom gremlin.io.registry, I have always had to take the additional step (long before this PR) of forcibly initializing `HadoopPools` before touching SparkGraphComputer in my app, or else some part of Spark -- I think the closure serializer -- would attempt to use HadooPools via ObjectWritable/VertexWritable before initialization and produce garbage on my custom classes. **This problem predates my PR**. I'm not trying to solve it here, in part because I still don't know if it's a pathology specific to my app or because TinkerPop is missing a crucial `HadoopPools.initialize` (now, equivalently, `KryoShimLoaderService.applyConfiguration`) call somewhere, and in part because HadoopPools is such a hideous architectural wart that the ultimate solution probably involves destroying it. In the past, I've worked around this by defining a custom spark.serializer that delegates newKryo() to a GryoSerializer/IoRegistryAwareSerializer, but which has a constructor that invokes `HadoopPools.initialize`/`KryoShimLoaderService.applyConfiguration` (relying on that method's idempotence). Again, this initialization step just be specific to my app and unnecessary for the average TinkerPop user. It's possible that the config I pasted above will work for others. FWIW, this passes, so the overrides bug should be fixed along with all this refactoring stuff: ``` mvn clean install -DskipTests=true && mvn verify -pl gremlin-server -DskipIntegrationTests=false -Dtest.single=GremlinResultSetIntegrateTest ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---