[ 
https://issues.apache.org/jira/browse/TINKERPOP-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622698#comment-15622698
 ] 

Marko A. Rodriguez commented on TINKERPOP-1118:
-----------------------------------------------

I think we can get rid of the {{VertexWritable}}/{{ObjectWritable}} 
serialization issues if we solve this ticket. cc/ [~dalaro]

Right now, {{VertexWritable}} and {{ObjectWritable}} have their own 
serialization logic. This is important as these classes are used outside of 
just running jobs, but also for reading and writing {{SequenceFiles}}. In 
Spark, we don't need to have the RDD use these writables and in fact, can just 
directly reference the objects they wrap. In this way, we could have a better 
split between {{GryoInput/OutputFormat}} and the internal job serialization 
(message passing and the like).

> SparkGraphComputer should use StarGraph, not VertexWritable.
> ------------------------------------------------------------
>
>                 Key: TINKERPOP-1118
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1118
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
>             Fix For: 3.3.0
>
>
> {{SparkGraphComputer}} input RDDs are typed as:
> {code}
> JavaPairRDD<Object,VertexWritable>
> {code}
> The {{VertexWritable}} usage is a vestige from Hadoop and Giraph. In Spark, 
> we don't need to have this wrapper and thus, we can reduce the overhead (one 
> less object header) by making the input RDDs typed as:
> {code}
> JavaPairRDD<Object,StarGraph>
> {code}
> This would be a breaking change for graph providers that implement their own 
> {{InputRDD}} and {{OutputRDD}}, however, the fix is trivial. Instead of {{new 
> VertexWritable(vertex)}}, they would simply do {{StarGraph.of(vertex)}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to