GitHub user andralungu opened a pull request:

    https://github.com/apache/flink/pull/1124

    [FLINK-2661] Added Node Splitting Methods

    Social media graphs, citation networks or even protein networks have a 
common property: their degree distribution follows a power-law curve. This 
structure raises challenges to both vertex-centric and GSA/GAS models because 
they uniformly process vertices, regardless of their degree distribution. This 
leads to large execution time stalls: vertices wait for skewed nodes to finish 
computation [synchronous]. 
    
    This PR aims to diminish the impact of high-degree nodes by proposing four 
main functions: `determinieSkewedVertices`, `treeDeAggregate` (splits a node 
into subnodes, recursively, in levels), `propagateValuesToSplitVertices` 
(useful when the algorithm performs more than one superstep), `treeAggregate` 
(brings the graph back to its initial state). 
    
    These functions modify a graph at a high-level, making its degree 
distribution more uniform. The method does not need any special partitioning or 
runtime modification and (for skewed networks and computationally intensive 
algorithms) can speed up the run time by a factor of two.     
    
    I added an example: NodeSplittingJaccardSimilarityMeasure, for which I 
needed to split the overall sequence of operations to two functions to be able 
to test the result. Calling the entire main method would have resulted in the 
"Two few memory segments etc" exception - too many operations called within one 
test, in other words. 
    
    For more info, please consult the additional entry in the documentation. 
    
    If we reach a common point and this PR gets merged, I will also follow 
@fhueske 's suggestion from the mailing list - adding a Split version of each 
of the library methods to allow users to verify whether their run time 
improves. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andralungu/flink splitJaccardFlink

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1124
    
----
commit b02d0917edcd5f3c8846fe01044afd7444a58c08
Author: Andra Lungu <lungu.an...@gmail.com>
Date:   2015-09-12T10:25:20Z

    [FLINK-2661] Added Node Splitting Methods
    
    [FLINK-2661] Minor modifications in the docs
    
    [FLINK-2661] pdf to png

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to