[ https://issues.apache.org/jira/browse/TINKERPOP-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643168#comment-17643168 ]
ASF GitHub Bot commented on TINKERPOP-2834: ------------------------------------------- ministat opened a new pull request, #1885: URL: https://github.com/apache/tinkerpop/pull/1885 The current CloneVertexProgram does nothing in its execute method, and the SparkGraphComputer needs to run general VertexProgram which requires a shuffle stage, which can be removed. Here a shortcut is implemented. When I exported two big graph, the overall exporting time was improved a lot. See the following table. ``` ----------------------------- |Graph 1 |Graph 2 ----------------------------- Before fix |3.6h |22min ----------------------------- After fix |2.4h |16min ``` Graph 1 has 15 billion vertice and 23 billion edges. Graph 2 has 130 million vertices and 650 million edges. > CloneVertexProgram optimization on SparkGraphComputer > ----------------------------------------------------- > > Key: TINKERPOP-2834 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2834 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Reporter: Redriver > Priority: Major > > The CloneVertexProgram does nothing in its execute() method, but in > SparkGraphComputer it has to process as standard GraphComputer semantics, > which takes many unnecessary computation. In fact, registering a special > SparkVertexProgramInterceptor with empty apply() can improve the overall > performance a lot. -- This message was sent by Atlassian Jira (v8.20.10#820010)