[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471128#comment-16471128 ] shahid commented on SPARK-24217: Thanks for the clarification Joseph K. Bradley Is it really required to append the result with the input dataframe? Because with the existing implementation, i can able to get the desired output with my fix. For eg: id neighbor similarity 1 [ 2, 3, 4, 5] [ 1.0, 1.0, 1.0, 1.0] 6 [ 7, 8 , 9, 10] [1.0 1.0 1.0 1.0] Output in *spark.ml* (With my fix) id prediction 1 0 2 0 3 0 4 0 5 0 6 1 7 1 8 1 9 1 10 1 > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: shahid >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471040#comment-16471040 ] spark_user commented on SPARK-24217: Thanks for the clarification. I am closing the PR. > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470704#comment-16470704 ] Joseph K. Bradley commented on SPARK-24217: --- On the topic of eating my words, please check out my new comment here: [SPARK-15784]. We may need to rework the API. > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469859#comment-16469859 ] spark_user commented on SPARK-24217: Behaviour should be same for both spark.ml and spark.mllib right? > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469858#comment-16469858 ] spark_user commented on SPARK-24217: For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id for all the vertices. For eg: id neighbor similarity 1 [ 2, 3, 4, 5] [ 1.0, 1.0, 1.0, 1.0] 6 [ 7, 8 , 9, 10] [1.0 1.0 1.0 1.0] Output in spark.ml id prediction 1 0 6 1 Output in spark.mllib Id prediction 1 0 2 0 3 0 4 0 5 0 6 1 7 1 8 1 9 1 10 1 > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469562#comment-16469562 ] Joseph K. Bradley commented on SPARK-24217: --- But the reason that the IDs are missing from the "id" column is that the input is not symmetric. If it were made symmetric, then there could not be any missing IDs. > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > Currently PIC is not returning the cluster indices of neighbour IDs which are > not there in the ID column. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469245#comment-16469245 ] spark_user commented on SPARK-24217: PIC should return the cluster indices of each vertex of the graph, as per the definition of PIC, which is also given in the comment in the PowerIterationClustering.scala in spark.ml > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > Currently PIC will not return the cluster indices of neighbour IDs which are > not there in the ID column. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469243#comment-16469243 ] spark_user commented on SPARK-24217: Thanks for the comment Joseph K. Bradley. Actually the issue is not about the symmetric similarity matrix. Spark.mllib PIC assigns cluster indices corresponding to all the vertices of the similarity graph. But spark.ml doesn't return the cluster ids of the vertices which are not there in the ID column. This can be clearly visible in the test cases of both spark.ml and spark.mllib > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > Currently PIC will not return the cluster indices of neighbour IDs which are > not there in the ID column. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
[ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469230#comment-16469230 ] Joseph K. Bradley commented on SPARK-24217: --- I don't really think this is a bug. PIC's documentation says pretty clearly that the input data has to represent a symmetric matrix, and this example seems to be failing because the input data is invalid. I do think it could be valuable to throw a better error when the input is not symmetric, though we should make sure that any check we do for this is not too expensive. > Power Iteration Clustering is not displaying cluster indices corresponding to > some vertices. > > > Key: SPARK-24217 > URL: https://issues.apache.org/jira/browse/SPARK-24217 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: spark_user >Priority: Major > Fix For: 2.4.0 > > > We should display prediction and id corresponding to all the nodes. > As per the definition of PIC clustering, given in the code, > PIC takes an affinity matrix between items (or vertices) as input. An > affinity matrix > is a symmetric matrix whose entries are non-negative similarities between > items. > PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each > input row includes: > * {{idCol}}: vertex ID > * {{neighborsCol}}: neighbors of vertex in {{idCol}} > * {{similaritiesCol}}: non-negative weights (similarities) of edges between > the vertex > in {{idCol}} and each neighbor in {{neighborsCol}} > * *"PIC returns a cluster assignment for each input vertex."* It appends a > new column {{predictionCol}} > containing the cluster assignment in {{[0,k)}} for each row (vertex). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org