[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317589#comment-15317589
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user dalaro commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Just a reminder: we are voting on branch 
https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321.  That 
subsumes the PR and adds a combination of fixes from Marko and me.


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> TinkerPop serialization, but it would make it possible for users to apply the 
> spark.serializer of their choice, switching off GryoSerializer if they so 
> choose.  Users could also continue to use 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread dalaro
Github user dalaro commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Just a reminder: we are voting on branch 
https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321.  That 
subsumes the PR and adds a combination of fixes from Marko and me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317578#comment-15317578
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
VOTE +1.


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> TinkerPop serialization, but it would make it possible for users to apply the 
> spark.serializer of their choice, switching off GryoSerializer if they so 
> choose.  Users could also continue to use GyroSerializer as they have until 
> now.
> I've already run this through a few iterations locally and have an 
> abstraction that allows me to run with 

[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317312#comment-15317312
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user spmallette commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Yeah - i know - it's there, but it's proliferating and becoming more and 
more OK which i think is "bad".  i'll bring it up as a separate issue.


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> TinkerPop serialization, but it would make it possible for users to apply the 
> spark.serializer of their choice, switching off GryoSerializer if they so 
> choose.  Users could also continue to use GyroSerializer as they have until 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread spmallette
Github user spmallette commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Yeah - i know - it's there, but it's proliferating and becoming more and 
more OK which i think is "bad".  i'll bring it up as a separate issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317304#comment-15317304
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
We use `Random` in numerous places to test various "versions" of a 
particular code block. It is possible to replace all possible choices with new 
`GraphProviders` but you are talking about the test suite taking on the order 
of 20+ hours to complete then. If you have a better idea for testing these 
facets, then please provide a PR. Note that this PR's use of `Random` is not 
something "novel" as we use `Random` already in `SparkHadoopGraphProvider` to 
select `InputFormats`, `OutputFormats`, and traversal sources (deprecated vs. 
non-deprecated representations).


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread okram
Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
We use `Random` in numerous places to test various "versions" of a 
particular code block. It is possible to replace all possible choices with new 
`GraphProviders` but you are talking about the test suite taking on the order 
of 20+ hours to complete then. If you have a better idea for testing these 
facets, then please provide a PR. Note that this PR's use of `Random` is not 
something "novel" as we use `Random` already in `SparkHadoopGraphProvider` to 
select `InputFormats`, `OutputFormats`, and traversal sources (deprecated vs. 
non-deprecated representations).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317291#comment-15317291
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user spmallette commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
> Marko setup SparkHadoopGraphProvider to randomly choose either the old 
serialization code or the new stuff in 
e700363#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be 
overrideable to make it deterministic, but it's not a huge deal since it's 
limited to test code.

+1 - i'd prefer to not have "random" in our test cases. 


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> TinkerPop serialization, but it 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread spmallette
Github user spmallette commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
> Marko setup SparkHadoopGraphProvider to randomly choose either the old 
serialization code or the new stuff in 
e700363#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be 
overrideable to make it deterministic, but it's not a huge deal since it's 
limited to test code.

+1 - i'd prefer to not have "random" in our test cases. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317279#comment-15317279
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user dalaro commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
These questions involved a lot of back-and-forth that would have been slow 
to resolve on github comments, so and I discussed the failures and the code on 
a hangout earlier today.

Summary:

* Marko created 
https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP 
repo.  This branch is my PR plus additional tweaks Marko and I worked out on 
top.  His branch supersedes this PR branch now.  This PR branch **should not be 
merged because Marko's branch subsumes/extends it**.
* I have a rough idea of what changed as we discussed code on the hangout, 
but I will go through his branch's new commits and review.
* The new serialization code is lower-priority than the old code.  It has 
to be turned on through explicit user action, like this is the test provider 
code:
```
System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY, 
UnshadedKryoShimService.class.getCanonicalName());
config.put("spark.serializer", 
KryoSerializer.class.getCanonicalName());
config.put("spark.kryo.registrator", 
GryoRegistrator.class.getCanonicalName());
```
* Marko setup SparkHadoopGraphProvider to randomly choose either the old 
serialization code or the new stuff in 
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91.
  IMHO that should be overrideable to make it deterministic, but it's not a 
huge deal since it's limited to test code.
* We replaced `GryoClassResolver` with a series of class registrations in 
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207.
  The idea here is to explicitly configure which types need a detaching 
serializer rather than relying on `GryoClassResolver` to automagically do it 
regardless of explicit class registrations.  Three factors influence this 
choice:
   * Spark's KryoSerializer does not support a user-provided ClassResolver, 
and working around that limitation entails either getting a change through 
Spark upstream or copying and pasting a bunch of KryoSerializer's 
private/hardcoded class registrations, some of which could change in future 
Spark versions.  The latter is ugly and would make the code brittle across 
Spark upgrades.
   * `GryoClassResolver` is convenient but somewhat mysterious to third 
parties, who are likelier to have seen a custom class registration mechanism 
(like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom 
resolver.  It also removes vendor control in that its detachment behavior is 
not configurable, so modifying it requires subclassing or reimplementation. For 
instance, if a vendor had a lightweight Vertex implementation with a custom 
serializer and wanted to bypass TP's resolver-driven detachment, then I don't 
think it would be possible without modifying TP's resolver.
   * `GryoClassResolver` is written against a bunch of shaded types and 
relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one 
version too late for compatibility with Spark's Kryo 2.21.  This is the least 
important concern of the three, more an implementation detail.
* Marko is running the full integration test suite now.  He ran the spark 
ITs just before pushing and reported success on those specifically.


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread dalaro
Github user dalaro commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
These questions involved a lot of back-and-forth that would have been slow 
to resolve on github comments, so and I discussed the failures and the code on 
a hangout earlier today.

Summary:

* Marko created 
https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP 
repo.  This branch is my PR plus additional tweaks Marko and I worked out on 
top.  His branch supersedes this PR branch now.  This PR branch **should not be 
merged because Marko's branch subsumes/extends it**.
* I have a rough idea of what changed as we discussed code on the hangout, 
but I will go through his branch's new commits and review.
* The new serialization code is lower-priority than the old code.  It has 
to be turned on through explicit user action, like this is the test provider 
code:
```
System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY, 
UnshadedKryoShimService.class.getCanonicalName());
config.put("spark.serializer", 
KryoSerializer.class.getCanonicalName());
config.put("spark.kryo.registrator", 
GryoRegistrator.class.getCanonicalName());
```
* Marko setup SparkHadoopGraphProvider to randomly choose either the old 
serialization code or the new stuff in 
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91.
  IMHO that should be overrideable to make it deterministic, but it's not a 
huge deal since it's limited to test code.
* We replaced `GryoClassResolver` with a series of class registrations in 
https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207.
  The idea here is to explicitly configure which types need a detaching 
serializer rather than relying on `GryoClassResolver` to automagically do it 
regardless of explicit class registrations.  Three factors influence this 
choice:
   * Spark's KryoSerializer does not support a user-provided ClassResolver, 
and working around that limitation entails either getting a change through 
Spark upstream or copying and pasting a bunch of KryoSerializer's 
private/hardcoded class registrations, some of which could change in future 
Spark versions.  The latter is ugly and would make the code brittle across 
Spark upgrades.
   * `GryoClassResolver` is convenient but somewhat mysterious to third 
parties, who are likelier to have seen a custom class registration mechanism 
(like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom 
resolver.  It also removes vendor control in that its detachment behavior is 
not configurable, so modifying it requires subclassing or reimplementation. For 
instance, if a vendor had a lightweight Vertex implementation with a custom 
serializer and wanted to bypass TP's resolver-driven detachment, then I don't 
think it would be possible without modifying TP's resolver.
   * `GryoClassResolver` is written against a bunch of shaded types and 
relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one 
version too late for compatibility with Spark's Kryo 2.21.  This is the least 
important concern of the three, more an implementation detail.
* Marko is running the full integration test suite now.  He ran the spark 
ITs just before pushing and reported success on those specifically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316836#comment-15316836
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
So, `mvn clean install` is happy w/o the 
`TravesralInterruptionComputerTest` (commented out). Running full integration 
tests now...


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> TinkerPop serialization, but it would make it possible for users to apply the 
> spark.serializer of their choice, switching off GryoSerializer if they so 
> choose.  Users could also continue to use GyroSerializer as they have until 
> now.
> I've 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread okram
Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
So, `mvn clean install` is happy w/o the 
`TravesralInterruptionComputerTest` (commented out). Running full integration 
tests now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316810#comment-15316810
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Okay. So 1 and 2 above are done, however, I had to `OPT_OUT` of 
`TraversalInterruptionComputerTest` as for some reason, the 
`GryoPoolKryoShimService` is not respecting interruption exceptions. On the 
plus side, serialization is completely handled by the shim service which 
ensures consistency between computer and `ComputerResult` data.


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> TinkerPop serialization, but it would make 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread okram
Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Okay. So 1 and 2 above are done, however, I had to `OPT_OUT` of 
`TraversalInterruptionComputerTest` as for some reason, the 
`GryoPoolKryoShimService` is not respecting interruption exceptions. On the 
plus side, serialization is completely handled by the shim service which 
ensures consistency between computer and `ComputerResult` data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316622#comment-15316622
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
I think some things need to be fixed up.

1. `TinkerPopKryoRegistrator` should be called `GryoRegistrator`.
2. I think we need a `GryoPoolKryoShimService` in core that 
non-Spark/Hadoop projects can leverage.
** We need to get `VertexProgramHelper.serializer/deserializer()` to use 
`KryoShimService`.

I'm working on 2 right now and things are moving forward.


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those types 
> implement Serializable, and it doesn't matter if Spark's KryoSerializer is 
> configured to accept those types.  If  those types are reachable from 
> Object/VertexWritable, then they must be registered with TinkerPop's internal 
> shaded Kryo, or else it will choke on them (unless it was explicitly 
> configured to allow unregistered classes).
> I suggest the following change to give users more flexibility in their choice 
> of spark.serializer, and to allow them to reuse TinkerPop's serializers if 
> they choose not to use GryoSerializer: introduce lightweight interfaces that 
> decouple TinkerPop's serialization logic from the exact Kryo shaded/unshaded 
> package doing the work.  TinkerPop's serialization logic would be written 
> against interfaces that replicate a minimal subset of Kryo, and then TP's 
> shaded Kryo or Spark's unshaded Kryo could be plugged in underneath without 
> having to touch the source, recompile any TinkerPop code, or munge bytecode 
> at runtime.
> This would not resolve all of the potential problems/complexity around 
> 

Re: [DISCUSS] ASF Board Draft Report - June 2016

2016-06-06 Thread Ted Wilmes
This looks good to me.

Thanks,
Ted

On Sat, Jun 4, 2016, 11:54 AM Stephen Mallette  wrote:

> As TinkerPop has graduated from incubator, we no longer need to produce
> incubator reports. Instead, TinkerPop needs to now provide reports to the
> ASF Board. Ultimately, as PMC chair it is my responsibility to submit the
> report when required, but I liked our method of discussing the incubator
> report prior to submission so I'd like to continue that. The content is
> largely the same - you can read more about what's required here:
>
> http://www.apache.org/foundation/board/reporting
>
> and see examples of other project's reports here:
>
>
> https://svn.apache.org/repos/private/foundation/board/board_agenda_2016_05_18.txt
>
> Anyway, here's a draft of the June report that I've prepared - please feel
> free to offer any feedback:
>
>
> ## Description:
>  Apache TinkerPop is a graph computing framework for both graph databases
>  (OLTP) and graph analytic systems (OLAP).
>
>  This is TinkerPop's first month outside of incubation. We are still
>  awaiting resource transfer changes related to graduation to take place as
>  there are issues preventing the infrastructure team from completing that
>  process.
>
>  Project development has been building up to two new releases
>  which are expected to be released in July. A key area of focus in this
>  last month has been related to opening TinkerPop to non-JVM programming
>  languages through the concept of Gremlin Language Variants[1]. In taking
>  this direction, TinkerPop becomes more accessible to non-JVM developer
>  communities.
>
> ## Issues:
>   There are no issues requiring board attention at this time.
>
> ## Releases:
>  - 3.1.2 (April 8, 2016)
>  - 3.2.0 (April 8, 2016)
>
> ## PMC/Committer:
>
>  - Last PMC addition was Dylan Millikin - May 2016
>  - Last committer addition was Michael Pollmeier - April 2016
>
> [1]
>
> http://tinkerpop.apache.org/docs/3.2.1-SNAPSHOT/tutorials/gremlin-language-variants/
>


[jira] [Commented] (TINKERPOP-1321) Loosen coupling between TinkerPop serialization logic and shaded Kryo

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316514#comment-15316514
 ] 

ASF GitHub Bot commented on TINKERPOP-1321:
---

Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Testing this morning, I get two test failures.

```
Tests in error:
  
GroovyProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117
 » IllegalState
  
ProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117
 » IllegalState
```

Both these tests make use of a user defined `VertexProgram` that stores 
vertices as `HALTED_TRAVERSERS`. The problem is that the vertices are not being 
detached.

```
Caused by: java.io.NotSerializableException: 
org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex
Serialization stack:
- object not serializable (class: 
org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex,
 value: v[3])
- writeObject data (class: java.util.HashMap)
- object (class java.util.HashMap, {v[3]=3, v[5]=1})
- field (class: 
org.apache.tinkerpop.gremlin.process.traversal.traverser.util.AbstractTraverser,
 name: t, type: class java.lang.Object)
- object (class 
org.apache.tinkerpop.gremlin.process.traversal.traverser.B_LP_O_S_SE_SL_Traverser,
 {v[3]=3, v[5]=1})
```

In the past, this would be automatically detached, but now it doesn't seem 
to be. I'll fiddle a bit to sew whats going on, but just heads up in case its 
obvious to use @dalaro ...


> Loosen coupling between TinkerPop serialization logic and shaded Kryo
> -
>
> Key: TINKERPOP-1321
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1321
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 3.2.0-incubating
>Reporter: Dan LaRocque
> Fix For: 3.2.1
>
>
> When running jobs on Spark, TinkerPop currently recommends setting 
> spark.serializer=GryoSerializer.  This makes GryoSerializer responsible for 
> serializing not just TinkerPop types but also scala runtime types and Spark 
> internals.  GryoSerializer doesn't extend either of the two serializers 
> provided by Spark.  It effectively assumes responsibility for reimplementing 
> them.
> This is problematic.  It is not totally trivial to replicate the 
> functionality of Spark's standard serializers.  It is also not easy to 
> empirically test all meaningful cases.  For instance, there is a conditional 
> within Spark that selects between two concrete Map implementations depending 
> on whether the current RDD partition count exceeds 2k 
> (https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala#L47-L53).
>   The implementation used below this threshold serializes fine on 
> GryoSerializer.  The implementation used above the threshold does not.  Above 
> the partition threshold, I've started getting 
> {{org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: 
> org.apache.tinkerpop.shaded.kryo.KryoException: java.io.IOException: I failed 
> to find one of the right cookies.}}  Google leads to 
> https://github.com/RoaringBitmap/RoaringBitmap/issues/64.  However, just 
> switching to Spark's {{KryoSerializer}} without changing anything somehow 
> fixes the problem in my environment, implying that Spark has done something 
> to address this problem that may not be fully replicated in TinkerPop.
> However, "just switching to Spark's KryoSerializer" is not a great approach.  
> For one thing, we lose the benefit of TinkerPop's space-efficient StarGraph 
> serializer, and Spark traversals can produce a lot of little ego-StarGraphs.  
> These still serialize, but KryoSerializer uses its default behavior 
> (FieldSerializer), which is not as clever about StarGraphs as TinkerPop's 
> StarGraphSerializer.  TinkerPop's reliance on its own internal shaded Kryo 
> means that its serializers cannot be registered with Spark's unshaded Kryo.
> More concerning, it's impossible to completely switch to KryoSerializer just 
> by tweaking the configuration.  Besides spark.serializer, there is also a 
> setting spark.closure.serializer for which the only supported value is 
> JavaSerializer.  Key TP classes that make it into the object reference graphs 
> of Spark closures implement Serializable by resorting to TinkerPop's shaded 
> Kryo via HadoopPools (looking at Object/VertexWritable).  This leads to 
> surprises with custom property data types.  It doesn't matter if those 

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread okram
Github user okram commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
Testing this morning, I get two test failures.

```
Tests in error:
  
GroovyProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117
 » IllegalState
  
ProgramTest$Traversals>ProgramTest.g_V_outXcreatedX_aggregateXxX_byXlangX_groupCount_programXTestProgramX_asXaX_selectXa_xX:117
 » IllegalState
```

Both these tests make use of a user defined `VertexProgram` that stores 
vertices as `HALTED_TRAVERSERS`. The problem is that the vertices are not being 
detached.

```
Caused by: java.io.NotSerializableException: 
org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex
Serialization stack:
- object not serializable (class: 
org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex,
 value: v[3])
- writeObject data (class: java.util.HashMap)
- object (class java.util.HashMap, {v[3]=3, v[5]=1})
- field (class: 
org.apache.tinkerpop.gremlin.process.traversal.traverser.util.AbstractTraverser,
 name: t, type: class java.lang.Object)
- object (class 
org.apache.tinkerpop.gremlin.process.traversal.traverser.B_LP_O_S_SE_SL_Traverser,
 {v[3]=3, v[5]=1})
```

In the past, this would be automatically detached, but now it doesn't seem 
to be. I'll fiddle a bit to sew whats going on, but just heads up in case its 
obvious to use @dalaro ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Thinking on Release

2016-06-06 Thread Marko Rodriguez
Yes, end of June-ish is best for me as I have few things on my plate the first 
half of this month.

Thanks,
Marko.

http://markorodriguez.com


> On Jun 6, 2016, at 4:46 AM, Stephen Mallette  wrote:
> 
> We didn't really discuss a date for release on this thread. I was thinking
> that we could start looking at the week of July 4th as the target week for
> VOTE and nail down a date as we get closer.
> 
> On Wed, May 25, 2016 at 2:42 PM, Jason Plurad  wrote:
> 
>> I'd think from a TinkerPop branding perspective, it probably helps to have
>> the name in there. It's Apache TinkerPop, not Apache Gremlin.
>> 
>> I just took a quick look on a mirror, and some other Apache projects
>> (Spark, Kafka, HBase, NiFi, Pig, Zookeeper) don't even include apache in
>> their distributables, so maybe we can just do:
>> 
>> tinkerpop-gremlin-console-x.y.z.zip
>> tinkerpop-gremlin-server-x.y.z.zip
>> 
>> 
>> On Wed, May 25, 2016 at 11:03 AM Stephen Mallette 
>> wrote:
>> 
>>> jason i think that was a suggestion to conform more to standard apache
>>> releases from someone in incubator. if it was mandatory we would have
>>> burned for that too many times to count at this point. i'm good to change
>>> it if everyone else is. what do we want them to be?
>>> 
>>> apache-tinkerpop-console-x.y.z.zip
>>> apache-tinkerpop-server-x.y.z.zip
>>> 
>>> or the full business:
>>> 
>>> apache-tinkerpop-gremlin-console-x.y.z.zip
>>> apache-tinkerpop-gremlin-server-x.y.z.zip
>>> 
>>> i guess we lost "-incubating" now so the latter doesn't look so bad to me
>>> anymore.
>>> 
>>> 
>>> On Wed, May 25, 2016 at 10:56 AM, Marko Rodriguez 
>>> wrote:
>>> 
 Hi,
 
 Yes, an imminent release is good. There are 2 severe bug fixes in
>> master/
 (3.2.1) that I would like to get out there. 3.2.0 had lots of internal
 changes to OLAP and I paid the price by incurring bugs. :|
 
> Somebody had mentioned that our distributables are supposed to be
>> named
> apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's
> something that should be done along with this release.
 
 There is really no such thing as "tinkerpop" besides the source code
>>> which
 is distributed as apache-tinkerpop-*.zip. The two other distributions
>> are
 gremlin-console and gremlin-server and I think we should keep those
>>> naming
 conventions as they are so they reflect what is being distributed.
>> Thus,
>>> I
 think the naming of our artifacts is correct.
 
 Thanks,
 Marko.
 
 http://markorodriguez.com
 
 On May 25, 2016, at 8:38 AM, Jason Plurad  wrote:
 
> Somebody had mentioned that our distributables are supposed to be
>> named
> apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's
> something that should be done along with this release.
> 
> -- Jason
> 
> On Wed, May 25, 2016 at 9:55 AM, Stephen Mallette <
>>> spmalle...@gmail.com>
> wrote:
> 
>> cool, Ted. it would be good to have another hand there.
>> 
>> On Wed, May 25, 2016 at 9:13 AM, Ted Wilmes 
>>> wrote:
>> 
>>> I think a release sounds good.  I'd be interested in witnessing the
>>> the
>>> post-PMC vote release steps so that I might be able to help out on
>> an
>>> upcoming release.
>>> 
>>> --Ted
>>> 
>>> On Wed, May 25, 2016 at 5:35 AM, Marvin Froeder >> 
>> wrote:
>>> 
 Your are right, for some reason I though it was on the artifactId
>> as
>> well
 
 On Wed, May 25, 2016 at 10:22 PM, Stephen Mallette <
>> spmalle...@gmail.com
 
 wrote:
 
> I don't think we need to relocate anything. The "-incubating" is
>>> just
>>> in
> the version name, so we will just remove it for future releases.
> 
> On Wed, May 25, 2016 at 4:55 AM, Jean-Baptiste Musso <
>>> jbmu...@gmail.com>
> wrote:
> 
>> I think this is a good idea. This could make these releases look
>> more
>> "stable": I've often felt that the -incubating suffix somehow
>> made
>> releases look "alpha-ish" / "beta-ish", even though they were
>> not.
>> 
>> Naming aside, bug fixes never hurt.
>> 
>> Jean-Baptiste
>> 
>> On Wed, May 25, 2016 at 12:49 AM, Stephen Mallette <
 spmalle...@gmail.com
>> 
>> wrote:
>>> We've seen a lot of good fixes/optimizations to 3.1.3 and 3.2.1
>>> and I
>>> wonder if we shouldn't exercise our new found TLP powers to do
>> a
> release
>>> and get rid of the "-incubating" at the end of our "current"
>> distributions
>>> and artifacts. thoughts?
>> 
> 
 
>>> 
>> 
 
 
>>> 
>> 



Re: [DISCUSS] Thinking on Release

2016-06-06 Thread Stephen Mallette
We didn't really discuss a date for release on this thread. I was thinking
that we could start looking at the week of July 4th as the target week for
VOTE and nail down a date as we get closer.

On Wed, May 25, 2016 at 2:42 PM, Jason Plurad  wrote:

> I'd think from a TinkerPop branding perspective, it probably helps to have
> the name in there. It's Apache TinkerPop, not Apache Gremlin.
>
> I just took a quick look on a mirror, and some other Apache projects
> (Spark, Kafka, HBase, NiFi, Pig, Zookeeper) don't even include apache in
> their distributables, so maybe we can just do:
>
> tinkerpop-gremlin-console-x.y.z.zip
> tinkerpop-gremlin-server-x.y.z.zip
>
>
> On Wed, May 25, 2016 at 11:03 AM Stephen Mallette 
> wrote:
>
> > jason i think that was a suggestion to conform more to standard apache
> > releases from someone in incubator. if it was mandatory we would have
> > burned for that too many times to count at this point. i'm good to change
> > it if everyone else is. what do we want them to be?
> >
> > apache-tinkerpop-console-x.y.z.zip
> > apache-tinkerpop-server-x.y.z.zip
> >
> > or the full business:
> >
> > apache-tinkerpop-gremlin-console-x.y.z.zip
> > apache-tinkerpop-gremlin-server-x.y.z.zip
> >
> > i guess we lost "-incubating" now so the latter doesn't look so bad to me
> > anymore.
> >
> >
> > On Wed, May 25, 2016 at 10:56 AM, Marko Rodriguez 
> > wrote:
> >
> > > Hi,
> > >
> > > Yes, an imminent release is good. There are 2 severe bug fixes in
> master/
> > > (3.2.1) that I would like to get out there. 3.2.0 had lots of internal
> > > changes to OLAP and I paid the price by incurring bugs. :|
> > >
> > > > Somebody had mentioned that our distributables are supposed to be
> named
> > > > apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's
> > > > something that should be done along with this release.
> > >
> > > There is really no such thing as "tinkerpop" besides the source code
> > which
> > > is distributed as apache-tinkerpop-*.zip. The two other distributions
> are
> > > gremlin-console and gremlin-server and I think we should keep those
> > naming
> > > conventions as they are so they reflect what is being distributed.
> Thus,
> > I
> > > think the naming of our artifacts is correct.
> > >
> > > Thanks,
> > > Marko.
> > >
> > > http://markorodriguez.com
> > >
> > > On May 25, 2016, at 8:38 AM, Jason Plurad  wrote:
> > >
> > > > Somebody had mentioned that our distributables are supposed to be
> named
> > > > apache-tinkerpop*.zip instead of apache-gremlin*.zip. Maybe that's
> > > > something that should be done along with this release.
> > > >
> > > > -- Jason
> > > >
> > > > On Wed, May 25, 2016 at 9:55 AM, Stephen Mallette <
> > spmalle...@gmail.com>
> > > > wrote:
> > > >
> > > >> cool, Ted. it would be good to have another hand there.
> > > >>
> > > >> On Wed, May 25, 2016 at 9:13 AM, Ted Wilmes 
> > wrote:
> > > >>
> > > >>> I think a release sounds good.  I'd be interested in witnessing the
> > the
> > > >>> post-PMC vote release steps so that I might be able to help out on
> an
> > > >>> upcoming release.
> > > >>>
> > > >>> --Ted
> > > >>>
> > > >>> On Wed, May 25, 2016 at 5:35 AM, Marvin Froeder  >
> > > >> wrote:
> > > >>>
> > >  Your are right, for some reason I though it was on the artifactId
> as
> > > >> well
> > > 
> > >  On Wed, May 25, 2016 at 10:22 PM, Stephen Mallette <
> > > >> spmalle...@gmail.com
> > > 
> > >  wrote:
> > > 
> > > > I don't think we need to relocate anything. The "-incubating" is
> > just
> > > >>> in
> > > > the version name, so we will just remove it for future releases.
> > > >
> > > > On Wed, May 25, 2016 at 4:55 AM, Jean-Baptiste Musso <
> > > >>> jbmu...@gmail.com>
> > > > wrote:
> > > >
> > > >> I think this is a good idea. This could make these releases look
> > > >> more
> > > >> "stable": I've often felt that the -incubating suffix somehow
> made
> > > >> releases look "alpha-ish" / "beta-ish", even though they were
> not.
> > > >>
> > > >> Naming aside, bug fixes never hurt.
> > > >>
> > > >> Jean-Baptiste
> > > >>
> > > >> On Wed, May 25, 2016 at 12:49 AM, Stephen Mallette <
> > >  spmalle...@gmail.com
> > > >>
> > > >> wrote:
> > > >>> We've seen a lot of good fixes/optimizations to 3.1.3 and 3.2.1
> > > >>> and I
> > > >>> wonder if we shouldn't exercise our new found TLP powers to do
> a
> > > > release
> > > >>> and get rid of the "-incubating" at the end of our "current"
> > > >> distributions
> > > >>> and artifacts. thoughts?
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > >
> >
>


Re: When does incubator-suffix get dropped?

2016-06-06 Thread Stephen Mallette
Gruno has said that there were problems in infrastructure with the
incubation "upgrade" process - like stuff was broken. That was most of the
hangup as I understood it.

On Sat, Jun 4, 2016 at 2:40 PM, Hadrian Zbarcea  wrote:

> As soon as the graduation is announced.
>
> Homepage, mailing lists done (old ones will forward for a while). Next
> release should drop the 'incubating' part.
>
> Hadrian
>
>
> On 06/04/2016 02:33 PM, Marko Rodriguez wrote:
>
>> Hi,
>>
>> I’ve been hearing big talky talky that Incubator suffixes are “just
>> handled” — Git, homepage, mailing lists. When does talky talky become doey
>> doey?
>>
>> Thanks,
>> Marko.
>>
>> http://markorodriguez.com
>>
>>
>>
>>


[GitHub] incubator-tinkerpop pull request #314: TINKERPOP-1301 Provide Javadoc for Sc...

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-tinkerpop/pull/314


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tinkerpop issue #314: TINKERPOP-1301 Provide Javadoc for ScriptInp...

2016-06-06 Thread spmallette
Github user spmallette commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/314
  
Replaced by #327 - going to close this one from our end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

2016-06-06 Thread dalaro
Github user dalaro commented on the issue:

https://github.com/apache/incubator-tinkerpop/pull/325
  
I saw the `shouldHandleMonthDay` test failure on CI and pushed a fix.  I 
carelessly broke the read leg of that serializer while doing some tedious, 
boilerplate-y method signature changes to port a batch a serializers to the 
shim to support IoRegistry unification.  The one-liner commit that I just 
pushed fixed the problem locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---