GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/301

    TINKERPOP-1120: If there is no view nor messages, don't create empty 
views/messages in SparkExecutor

    https://issues.apache.org/jira/browse/TINKERPOP-1120
    
    The following PR effects TraversalVertexProgram and SparkGraphComputer. 
Here is what changed in both:
    
        -------SparkGraphComputer
        1. If the vertex doesn't pass any messages, don't serialize an empty 
list, serialize null.
        2. If the vertex doesn't have a view, don't serialize an empty list of 
detached vertex properties, serialize null.
        3. If the vertex doesn't have a view nor messages, don't do anything!
        -------TraversalVertexProgram
        4. Found a memory bug where halted traversers were still distributed 
amongst the vertices even though they were sent to the master traversal.
        5. If a halted traverser TraverserSet is empty, remove the property 
(remove the vertex view!).
    
    You can read about the performance gains by doing this here:
        https://groups.google.com/d/msg/gremlin-users/NKjEXdRNp-M/S48pDXjdAQAJ
    
    
    CHANGELOG
    
    ```
    * `SparkGraphComputer` no longer shuffles empty views or empty outgoing 
messages in order to save time and space.
    * `TraversalVertexProgram` no longer maintains empty halted traverser 
properties in order to save space.
    ```
    
    UPGRADE
    
    ```
    TraversalVertexProgram
    ----------------------
    
    `TraversalVertexProgram` always maintained a `HALTED_TRAVERSERS` 
`TraverserSet` for each vertex throughout the life of the OLAP computation. 
However, if there are no halted traversers in the set, then there is no point 
in keeping the compute property around as without it, time and space are saved. 
Users that have `VertexPrograms` that are chained off of 
`TraversalVertexProgram` that have previously assumed that `HALTED_TRAVERSERS` 
always exists, should now no longer assume that.
    
    ---java
    // bad code
    TraverserSet haltedTraversers = 
vertex.value(TraversalVertexProgram.HALTED_TRAVERSERS);
    // good code
    TraverserSet haltedTraversers = 
vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).orElse(new 
TraverserSet());
    ---
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1120

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #301
    
----
commit 4a7888681152cb005d632416371b7f66da6f119c
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-05-02T21:06:53Z

    Empty lists are not created if no messages or views are created. Instead 
the payload is null. This helps to reduce memory footprint both RAM and during 
shuffle/disk/network.

commit cd5524d73928c0e9b8a2260fad1b1e29c3f53ef5
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-05-02T22:16:15Z

    a bunch of nick-nack optimizations generally in TraversalVertexProgram and 
specifically in SparkGraphComputer. If there are no HALTED_TRAVERSERS, then do 
not propagate an empty set -- property.remove(). In Spark, if there are no 
outgoing messages or new view, do not propagate empty ViewPayloads -- using 
null. Found a memory bug in TraversalVertexProgram where if the 
HALTED_TRAVERSERS are suppose to go back to the master traverasl, they were 
still being persisted across the vertices. These tweaks should definately 
reduce stress on large graphs as the memory footprint is greatly reduced. 
Unfortutnately, we still need reduceByKey() even on empty views/messages as its 
not known that its empty until after the action.

commit e3a4b7ff9bd730b7056b4ab224ea8e9255263c9b
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-05-02T22:25:30Z

    another null memory tweak. no point sending around empty lists --- using 
null instead.

commit 6f13c0cfc20d8c0cbf1681359792e543bd3676bc
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-05-02T22:36:26Z

    more minor memory tweaks. running integration tests over night.

commit 79ebaf9f94f0b645ba493551ff219a786003cc85
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-05-03T01:03:11Z

    finally figured out how to do a reduceByKey() with empty tuples. This is 
the super optimization -- if there are no views and no outgoing messages, then 
the reduceByKey is trivially complex. For TraversalVertexProgram, this means 
that the final step takes no time at all. Running integration tests overnight.

commit 8fd9502160b7940a806247a16406663ff4b27826
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-05-03T13:49:14Z

    some last minute cleanups, comments before PR. integration tests passed 
over night. Spark integration tests passed for these changes right now.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to