GitHub user andralungu opened a pull request: https://github.com/apache/flink/pull/1124
[FLINK-2661] Added Node Splitting Methods Social media graphs, citation networks or even protein networks have a common property: their degree distribution follows a power-law curve. This structure raises challenges to both vertex-centric and GSA/GAS models because they uniformly process vertices, regardless of their degree distribution. This leads to large execution time stalls: vertices wait for skewed nodes to finish computation [synchronous]. This PR aims to diminish the impact of high-degree nodes by proposing four main functions: `determinieSkewedVertices`, `treeDeAggregate` (splits a node into subnodes, recursively, in levels), `propagateValuesToSplitVertices` (useful when the algorithm performs more than one superstep), `treeAggregate` (brings the graph back to its initial state). These functions modify a graph at a high-level, making its degree distribution more uniform. The method does not need any special partitioning or runtime modification and (for skewed networks and computationally intensive algorithms) can speed up the run time by a factor of two. I added an example: NodeSplittingJaccardSimilarityMeasure, for which I needed to split the overall sequence of operations to two functions to be able to test the result. Calling the entire main method would have resulted in the "Two few memory segments etc" exception - too many operations called within one test, in other words. For more info, please consult the additional entry in the documentation. If we reach a common point and this PR gets merged, I will also follow @fhueske 's suggestion from the mailing list - adding a Split version of each of the library methods to allow users to verify whether their run time improves. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andralungu/flink splitJaccardFlink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1124 ---- commit b02d0917edcd5f3c8846fe01044afd7444a58c08 Author: Andra Lungu <lungu.an...@gmail.com> Date: 2015-09-12T10:25:20Z [FLINK-2661] Added Node Splitting Methods [FLINK-2661] Minor modifications in the docs [FLINK-2661] pdf to png ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---