Marko A. Rodriguez created TINKERPOP-1131:
---------------------------------------------
Summary: TraversalVertexProgram traverser management is
inefficient memory-wise.
Key: TINKERPOP-1131
URL: https://issues.apache.org/jira/browse/TINKERPOP-1131
Project: TinkerPop
Issue Type: Improvement
Components: process
Affects Versions: 3.1.1-incubating
Reporter: Marko A. Rodriguez
Assignee: Marko A. Rodriguez
Fix For: 3.2.0-incubating
The traversers incoming to a vertex at an iteration are in a {{TraverserSet}}.
We iterate that set and attach the traversers to their respective local object
(e.g. vertex, edge, property, etc.). This creates a {{toProcess}}
{{TraverserSet}}. At this point, we have 2 sets the same size! We NEVER clear
the message set and process the {{toProcess}} traversers to create an
{{aliveTraversers}} set. Now, 3 sets! If you have millions of edges on an
{{outE()}} you have 3 million entry sets (nasty!). We then set {{toProcess}} to
{{aliveTraversers}} and keep doing this until the set is completely empty.
(they empty when a traverser needs to go to another vertex to keep processing
-- a message pass).
So, to preserve memory we need to "drain" the {{TraverserSets}}. That is,
iterate and {{remove()}} so that we don't create set clones and blow heap and
cause (e.g.) {{SparkGraphComputer}} to spill memory to disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)