[ 
https://issues.apache.org/jira/browse/SPARK-10994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10994.
-------------------------------
    Resolution: Won't Fix

> Clustering coefficient computation in GraphX
> --------------------------------------------
>
>                 Key: SPARK-10994
>                 URL: https://issues.apache.org/jira/browse/SPARK-10994
>             Project: Spark
>          Issue Type: New Feature
>          Components: GraphX
>            Reporter: Yang Yang
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The Clustering Coefficient (CC) is a fundamental measure in social (or other 
> type of) network analysis assessing the degree to which nodes tend to cluster 
> together [1][2]. Clustering coefficient, along with density, node degree, 
> path length, diameter, connectedness, and node centrality are seven most 
> important properties to characterise a network [3].
> We found that GraphX has already implemented connectedness, node centrality, 
> path length, but does not have a componenet for computing clustering 
> coefficient. This actually was the first intention for us to implement an 
> algorithm to compute clustering coefficient for each vertex of a given graph.
> Clustering coefficient is very helpful to many real applications, such as 
> user behaviour prediction and structure prediction (like link prediction). We 
> did that before in a bunch of papers (e.g., [4-5]), and also found many other 
> publication papers using this metric in their work [6-8]. We are very 
> confident that this feature will benefit GraphX and attract a large number of 
> users.
> References
> [1] https://en.wikipedia.org/wiki/Clustering_coefficient
> [2] Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of 
> ‘small-world’ networks." nature 393.6684 (1998): 440-442. (with 27266 
> citations).
> [3] https://en.wikipedia.org/wiki/Network_science
> [4] Jing Zhang, Zhanpeng Fang, Wei Chen, and Jie Tang. Diffusion of 
> "Following" Links in Microblogging Networks. IEEE Transaction on Knowledge 
> and Data Engineering (TKDE), Volume 27, Issue 8, 2015, Pages 2093-2106.
> [5] Yang Yang, Jie Tang, Jacklyne Keomany, Yanting Zhao, Ying Ding, Juanzi 
> Li, and Liangwei Wang. Mining Competitive Relationships by Learning across 
> Heterogeneous Networks. In Proceedings of the Twenty-First Conference on 
> Information and Knowledge Management (CIKM'12). pp. 1432-1441.
> [6] Clauset, Aaron, Cristopher Moore, and Mark EJ Newman. Hierarchical 
> structure and the prediction of missing links in networks. Nature 453.7191 
> (2008): 98-101. (with 973 citations)
> [7] Adamic, Lada A., and Eytan Adar. Friends and neighbors on the web. Social 
> networks 25.3 (2003): 211-230. (1238 citations)
> [8] Lichtenwalter, Ryan N., Jake T. Lussier, and Nitesh V. Chawla. New 
> perspectives and methods in link prediction. In KDD'10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to