[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291217#comment-15291217
 ] 

ASF GitHub Bot commented on FLINK-3780:
---------------------------------------

Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1980#discussion_r63893365
  
    --- Diff: docs/apis/batch/libs/gelly.md ---
    @@ -2051,6 +2052,26 @@ The algorithm takes a directed, vertex (and possibly 
edge) attributed graph as i
     vertex represents a group of vertices and each edge represents a group of 
edges from the input graph. Furthermore, each
     vertex and edge in the output graph stores the common group value and the 
number of represented elements.
     
    +### Jaccard Index
    +
    +#### Overview
    +The Jaccard Index measures the similarity between vertex neighborhoods. 
Scores range from 0.0 (no common neighbors) to
    +1.0 (all neighbors are common).
    +
    +#### Details
    +Counting common neighbors for pairs of vertices is equivalent to counting 
the two-paths consisting of two edges
    +connecting the two vertices to the common neighbor. The number of distinct 
neighbors for pairs of vertices is computed
    +by storing the sum of degrees of the vertex pair and subtracting the count 
of common neighbors, which are double-counted
    +in the sum of degrees.
    +
    +The algorithm first annotates each edge with the endpoint degree. Grouping 
on the midpoint vertex, each pair of
    +neighbors is emitted with the endpoint degree sum. Grouping on two-paths, 
the common neighbors are counted.
    +
    +#### Usage
    +The algorithm takes a simple, undirected graph as input and outputs a 
`DataSet` of tuples containing two vertex IDs,
    +the number of common neighbors, and the number of distinct neighbors. The 
graph ID type must be `Comparable` and
    --- End diff --
    
    It does, from `Result.getJaccardIndexScore()`.


> Jaccard Similarity
> ------------------
>
>                 Key: FLINK-3780
>                 URL: https://issues.apache.org/jira/browse/FLINK-3780
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>             Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to