[ https://issues.apache.org/jira/browse/TINKERPOP-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622698#comment-15622698 ]
Marko A. Rodriguez commented on TINKERPOP-1118: ----------------------------------------------- I think we can get rid of the {{VertexWritable}}/{{ObjectWritable}} serialization issues if we solve this ticket. cc/ [~dalaro] Right now, {{VertexWritable}} and {{ObjectWritable}} have their own serialization logic. This is important as these classes are used outside of just running jobs, but also for reading and writing {{SequenceFiles}}. In Spark, we don't need to have the RDD use these writables and in fact, can just directly reference the objects they wrap. In this way, we could have a better split between {{GryoInput/OutputFormat}} and the internal job serialization (message passing and the like). > SparkGraphComputer should use StarGraph, not VertexWritable. > ------------------------------------------------------------ > > Key: TINKERPOP-1118 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1118 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Affects Versions: 3.1.1-incubating > Reporter: Marko A. Rodriguez > Labels: breaking > Fix For: 3.3.0 > > > {{SparkGraphComputer}} input RDDs are typed as: > {code} > JavaPairRDD<Object,VertexWritable> > {code} > The {{VertexWritable}} usage is a vestige from Hadoop and Giraph. In Spark, > we don't need to have this wrapper and thus, we can reduce the overhead (one > less object header) by making the input RDDs typed as: > {code} > JavaPairRDD<Object,StarGraph> > {code} > This would be a breaking change for graph providers that implement their own > {{InputRDD}} and {{OutputRDD}}, however, the fix is trivial. Instead of {{new > VertexWritable(vertex)}}, they would simply do {{StarGraph.of(vertex)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)