[jira] [Commented] (TINKERPOP-2834) CloneVertexProgram optimization on SparkGraphComputer

ASF GitHub Bot (Jira) Mon, 05 Dec 2022 00:51:16 -0800


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643168#comment-17643168
 ]


ASF GitHub Bot commented on TINKERPOP-2834:
-------------------------------------------

ministat opened a new pull request, #1885:
URL: https://github.com/apache/tinkerpop/pull/1885

   The current CloneVertexProgram does nothing in its execute method, and the 
SparkGraphComputer needs to run general VertexProgram which requires a shuffle 
stage, which can be removed. Here a shortcut is implemented. When I exported 
two big graph, the overall exporting time was improved a lot. See the following 
table. 
   ```
   -----------------------------
              |Graph 1 |Graph 2
   -----------------------------
   Before fix |3.6h    |22min
   -----------------------------
   After fix  |2.4h    |16min
   ```
   Graph 1 has 15 billion vertice and 23 billion edges. Graph 2 has 130 million 
vertices and 650 million edges.




> CloneVertexProgram optimization on SparkGraphComputer
> -----------------------------------------------------
>
>                 Key: TINKERPOP-2834
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2834
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>            Reporter: Redriver
>            Priority: Major
>
> The CloneVertexProgram does nothing in its execute() method, but in 
> SparkGraphComputer it has to process as standard GraphComputer semantics, 
> which takes many unnecessary computation. In fact, registering a special 
> SparkVertexProgramInterceptor with empty apply() can improve the overall 
> performance a lot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TINKERPOP-2834) CloneVertexProgram optimization on SparkGraphComputer

Reply via email to