[jira] [Commented] (TINKERPOP-1108) Produce two RDDs from executeVertexProgram in SparkGraphComputer

Marko A. Rodriguez (JIRA) Fri, 29 Jan 2016 12:35:55 -0800

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124157#comment-15124157
 ]


Marko A. Rodriguez commented on TINKERPOP-1108:
-----------------------------------------------

The scary thing about this is that we have Spark accumulators emitted in the 
{{viewOutgoingMessageRDD}} and thus, we may have a problem with generating two 
RDDs as we might duplicate the accumulator data. However, we may just want to 
put the accumulator data into {{viewRDD}} and on the {{join()}}, broadcast the 
variables then! ... needs some thinking.

> Produce two RDDs from executeVertexProgram in SparkGraphComputer
> ----------------------------------------------------------------
>
>                 Key: TINKERPOP-1108
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1108
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>
> I have done a lot to optimize our implementation of {{SparkGraphComputer}}. I 
> now know the reason for every shuffle, input, spill, etc. piece of data that 
> happens during a job. There is one more optimization that MAY or MAY NOT 
> work, but it is worth trying because if it does what I think it will do, we 
> may get a (perhaps) 2x improvement.
> We current do:
> {code}
> graphRDD -> viewOutgoingMessagesRDD
> {code}
> We should do:
> {code}
> graphRDD -->
>    viewRDD
>    outgoingMessageRDD
> {code}
> The {{viewRDD}} with have the same partitioner as the {{graphRDD}} and thus, 
> a local join is all that is required. The {{outgoingMessageRDD}} will not be 
> partitioned so its join will cause shuffle. Thus, after this block, we do:
> {code}
> graphRDD.join(viewRDD).mapValues(...attach the 
> view...).join(outgoingMessageRDD)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP-1108) Produce two RDDs from executeVertexProgram in SparkGraphComputer

Reply via email to