[ 
https://issues.apache.org/jira/browse/TINKERPOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Xia closed TINKERPOP-1108.
-------------------------------
    Resolution: Won't Do

Closing given 
[discussion|https://lists.apache.org/thread/om2m0phg25s83529p9w0gldmcxz7578h] - 
it can be reopened if there is expectation that there will be active work on 
this item.

> Produce two RDDs from executeVertexProgram in SparkGraphComputer
> ----------------------------------------------------------------
>
>                 Key: TINKERPOP-1108
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1108
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>            Priority: Major
>
> I have done a lot to optimize our implementation of {{SparkGraphComputer}}. I 
> now know the reason for every shuffle, input, spill, etc. piece of data that 
> happens during a job. There is one more optimization that MAY or MAY NOT 
> work, but it is worth trying because if it does what I think it will do, we 
> may get a (perhaps) 2x improvement.
> We current do:
> {code}
> graphRDD -> viewOutgoingMessagesRDD
> {code}
> We should do:
> {code}
> graphRDD -->
>    viewRDD
>    outgoingMessageRDD
> {code}
> The {{viewRDD}} with have the same partitioner as the {{graphRDD}} and thus, 
> a local join is all that is required. The {{outgoingMessageRDD}} will not be 
> partitioned so its join will cause shuffle. Thus, after this block, we do:
> {code}
> graphRDD.join(viewRDD).mapValues(...attach the 
> view...).join(outgoingMessageRDD)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to