[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469562#comment-16469562 ]
Joseph K. Bradley edited comment on SPARK-24217 at 5/10/18 4:37 PM: -------------------------------------------------------------------- Update: I'll eat my words! I should have read the docs more carefully (where I missed the note that there should be exactly 1 reference from one node to another). This is actually a major problem with our design for PIC, which can't really be a Row -> Row Transformer. Will think more about this and re-post. was (Author: josephkb): But the reason that the IDs are missing from the "id" column is that the input is not symmetric. If it were made symmetric, then there could not be any missing IDs. > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > -------------------------------------------------------------------------------------------- > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.0 > Reporter: spark_user > Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org