[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-10 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471040#comment-16471040
 ] 

spark_user commented on SPARK-24217:


Thanks for the clarification. I am closing the PR.

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-10 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469858#comment-16469858
 ] 

spark_user edited comment on SPARK-24217 at 5/10/18 12:22 PM:
--

 
 
 Hi Joseph K Bradley,

 

For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity                               

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in *spark.ml* 

     id prediction  

      1       0

       6     1

 Input in spark.mllib

      id     neighbor    similarity                               

       1       2                1.0

       1       3                1.0

       1       4                1.0

       1       5                1.0

       6       7                1.0

       6       8               1.0

       6       9               1.0

       6      10               1.0 

Output in *spark.mllib*

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5     0

      6     1

     7      1

     8     1

     9     1

    10   1

 

 


was (Author: shahid):
Hi Joseph K Bradley,

For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity 

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in *spark.ml* 

     id prediction  

      1       0

        6     1

 

Output in *spark.mllib*

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5    0

      6     1

     7      1

   8       1

   9   1

    10   1

 

 

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469858#comment-16469858
 ] 

spark_user edited comment on SPARK-24217 at 5/10/18 3:11 AM:
-

Hi Joseph K Bradley,

For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity 

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in *spark.ml* 

     id prediction  

      1       0

        6     1

 

Output in *spark.mllib*

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5    0

      6     1

     7      1

   8       1

   9   1

    10   1

 

 


was (Author: shahid):
Hi Joseph K Bradley,

For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity 

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in spark.ml 

     id prediction  

      1       0

        6     1

 

Output in spark.mllib

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5    0

      6     1

     7      1

   8       1

   9   1

    10   1

 

 

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469859#comment-16469859
 ] 

spark_user edited comment on SPARK-24217 at 5/10/18 3:10 AM:
-

Behaviour should be same for both spark.ml and spark.mllib right? In fact 
spark.ml uses spark.mllib implementation of pic.


was (Author: shahid):
Behaviour should be same for both spark.ml and spark.mllib right?

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469859#comment-16469859
 ] 

spark_user commented on SPARK-24217:


Behaviour should be same for both spark.ml and spark.mllib right?

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469858#comment-16469858
 ] 

spark_user edited comment on SPARK-24217 at 5/10/18 2:59 AM:
-

Hi Joseph K Bradley,

For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity 

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in spark.ml 

     id prediction  

      1       0

        6     1

 

Output in spark.mllib

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5    0

      6     1

     7      1

   8       1

   9   1

    10   1

 

 


was (Author: shahid):
For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity 

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in spark.ml 

     id prediction  

      1       0

        6     1

 

Output in spark.mllib

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5    0

      6     1

     7      1

   8       1

   9   1

    10   1

 

 

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469858#comment-16469858
 ] 

spark_user commented on SPARK-24217:


For the same input in spark.ml and spark.mllib, spark.mllib giving cluster id 
for all the vertices.

 

For eg:

      id       neighbor          similarity 

       1       [ 2, 3, 4, 5]    [ 1.0, 1.0, 1.0, 1.0]  

       6     [  7, 8 , 9, 10]   [1.0 1.0 1.0 1.0]  

 

Output in spark.ml 

     id prediction  

      1       0

        6     1

 

Output in spark.mllib

     Id prediction

      1      0

       2     0

       3     0

       4     0

      5    0

      6     1

     7      1

   8       1

   9   1

    10   1

 

 

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24217:
---
Description: 
We should display prediction and id corresponding to all the nodes.  Currently 
PIC is not returning the cluster indices of neighbour IDs which are not there 
in the ID column.

As per the definition of PIC clustering, given in the code,

PIC takes an affinity matrix between items (or vertices) as input. An affinity 
matrix
 is a symmetric matrix whose entries are non-negative similarities between 
items.
 PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
input row includes:
 * {{idCol}}: vertex ID
 * {{neighborsCol}}: neighbors of vertex in {{idCol}}
 * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
the vertex
 in {{idCol}} and each neighbor in {{neighborsCol}}

 * *"PIC returns a cluster assignment for each input vertex."* It appends a new 
column {{predictionCol}}
 containing the cluster assignment in {{[0,k)}} for each row (vertex).

 

  was:
We should display prediction and id corresponding to all the nodes. 

As per the definition of PIC clustering, given in the code,

PIC takes an affinity matrix between items (or vertices) as input. An affinity 
matrix
 is a symmetric matrix whose entries are non-negative similarities between 
items.
 PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
input row includes:
 * {{idCol}}: vertex ID
 * {{neighborsCol}}: neighbors of vertex in {{idCol}}
 * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
the vertex
 in {{idCol}} and each neighbor in {{neighborsCol}}

 * *"PIC returns a cluster assignment for each input vertex."* It appends a new 
column {{predictionCol}}
 containing the cluster assignment in {{[0,k)}} for each row (vertex).

 Currently PIC will not return the cluster indices of neighbour IDs which are 
not there in the ID column.


> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.  
> Currently PIC is not returning the cluster indices of neighbour IDs which are 
> not there in the ID column.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469243#comment-16469243
 ] 

spark_user edited comment on SPARK-24217 at 5/9/18 6:20 PM:


Thanks for the comment Joseph K. Bradley.

Actually the issue is not about the symmetric similarity matrix.  Spark.mllib 
PIC assigns cluster indices corresponding to all the vertices of the similarity 
graph. But spark.ml doesn't return the cluster ids of the vertices which are 
not there in the "id" column.

This can be clearly visible in the test cases of both spark.ml and spark.mllib


was (Author: shahid):
Thanks for the comment Joseph K. Bradley.

Actually the issue is not about the symmetric similarity matrix.  Spark.mllib 
PIC assigns cluster indices corresponding to all the vertices of the similarity 
graph. But spark.ml doesn't return the cluster ids of the vertices which are 
not there in the ID column.

This can be clearly visible in the test cases of both spark.ml and spark.mllib

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes. 
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  Currently PIC will not return the cluster indices of neighbour IDs which are 
> not there in the ID column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469245#comment-16469245
 ] 

spark_user commented on SPARK-24217:


PIC should return the cluster indices of each vertex of the graph, as per the 
definition of PIC, which is also given in the comment in the 
PowerIterationClustering.scala in spark.ml

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes. 
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  Currently PIC will not return the cluster indices of neighbour IDs which are 
> not there in the ID column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469243#comment-16469243
 ] 

spark_user commented on SPARK-24217:


Thanks for the comment Joseph K. Bradley.

Actually the issue is not about the symmetric similarity matrix.  Spark.mllib 
PIC assigns cluster indices corresponding to all the vertices of the similarity 
graph. But spark.ml doesn't return the cluster ids of the vertices which are 
not there in the ID column.

This can be clearly visible in the test cases of both spark.ml and spark.mllib

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes. 
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  Currently PIC will not return the cluster indices of neighbour IDs which are 
> not there in the ID column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24217:
---
Description: 
We should display prediction and id corresponding to all the nodes. 

As per the definition of PIC clustering, given in the code,

PIC takes an affinity matrix between items (or vertices) as input. An affinity 
matrix
 is a symmetric matrix whose entries are non-negative similarities between 
items.
 PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
input row includes:
 * {{idCol}}: vertex ID
 * {{neighborsCol}}: neighbors of vertex in {{idCol}}
 * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
the vertex
 in {{idCol}} and each neighbor in {{neighborsCol}}

 * *"PIC returns a cluster assignment for each input vertex."* It appends a new 
column {{predictionCol}}
 containing the cluster assignment in {{[0,k)}} for each row (vertex).

 Currently PIC will not return the cluster indices of neighbour IDs which are 
not there in the ID column.

  was:
We should display prediction and id corresponding to all the nodes.

As per the definition of PIC clustering, given in the code,

PIC takes an affinity matrix between items (or vertices) as input. An affinity 
matrix
is a symmetric matrix whose entries are non-negative similarities between items.
PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
input row includes:
 * {{idCol}}: vertex ID
 * {{neighborsCol}}: neighbors of vertex in {{idCol}}
 * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
the vertex
in {{idCol}} and each neighbor in {{neighborsCol}}

 * *"PIC returns a cluster assignment for each input vertex."* It appends a new 
column {{predictionCol}}
containing the cluster assignment in {{[0,k)}} for each row (vertex).

 


> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes. 
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between 
> items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  Currently PIC will not return the cluster indices of neighbour IDs which are 
> not there in the ID column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24191) Scala example code for Power Iteration Clustering in Spark ML examples

2018-05-09 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24191:
---
Summary: Scala example code for Power Iteration Clustering in Spark ML 
examples  (was: SparkML: Example code for Power Iteration Clustering )

> Scala example code for Power Iteration Clustering in Spark ML examples
> --
>
> Key: SPARK-24191
> URL: https://issues.apache.org/jira/browse/SPARK-24191
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Examples, ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We need to provide an example code for Power Iteration Clustering in Spark ML 
> examples.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24224) Java example code for Power Iteration Clustering in spark.ml

2018-05-09 Thread spark_user (JIRA)
spark_user created SPARK-24224:
--

 Summary: Java example code for Power Iteration Clustering in 
spark.ml
 Key: SPARK-24224
 URL: https://issues.apache.org/jira/browse/SPARK-24224
 Project: Spark
  Issue Type: Documentation
  Components: ML
Affects Versions: 2.4.0
Reporter: spark_user
 Fix For: 2.4.0


Add a java example code for Power iteration clustering in spark.ml examples



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

2018-05-09 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24217:
---
Summary: Power Iteration Clustering is not displaying cluster indices 
corresponding to some vertices.  (was: Power Iteration Clustering is not 
displaying cluster indices corresponding to some nodes.)

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some vertices.
> 
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
> is a symmetric matrix whose entries are non-negative similarities between 
> items.
> PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
> in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
> containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some nodes.

2018-05-08 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468338#comment-16468338
 ] 

spark_user commented on SPARK-24217:


I am working on this issue

> Power Iteration Clustering is not displaying cluster indices corresponding to 
> some nodes.
> -
>
> Key: SPARK-24217
> URL: https://issues.apache.org/jira/browse/SPARK-24217
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes.
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An 
> affinity matrix
> is a symmetric matrix whose entries are non-negative similarities between 
> items.
> PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
> input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
> the vertex
> in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a 
> new column {{predictionCol}}
> containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some nodes.

2018-05-08 Thread spark_user (JIRA)
spark_user created SPARK-24217:
--

 Summary: Power Iteration Clustering is not displaying cluster 
indices corresponding to some nodes.
 Key: SPARK-24217
 URL: https://issues.apache.org/jira/browse/SPARK-24217
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.4.0
Reporter: spark_user
 Fix For: 2.4.0


We should display prediction and id corresponding to all the nodes.

As per the definition of PIC clustering, given in the code,

PIC takes an affinity matrix between items (or vertices) as input. An affinity 
matrix
is a symmetric matrix whose entries are non-negative similarities between items.
PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each 
input row includes:
 * {{idCol}}: vertex ID
 * {{neighborsCol}}: neighbors of vertex in {{idCol}}
 * {{similaritiesCol}}: non-negative weights (similarities) of edges between 
the vertex
in {{idCol}} and each neighbor in {{neighborsCol}}

 * *"PIC returns a cluster assignment for each input vertex."* It appends a new 
column {{predictionCol}}
containing the cluster assignment in {{[0,k)}} for each row (vertex).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24213) Power Iteration Clustering in the SparkML throws exception, when the ID is IntType

2018-05-08 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24213:
---
Summary: Power Iteration Clustering in the SparkML throws exception, when 
the ID is IntType  (was: Power Iteration Clustering in SparkML throws 
exception, when the ID is IntType)

> Power Iteration Clustering in the SparkML throws exception, when the ID is 
> IntType
> --
>
> Key: SPARK-24213
> URL: https://issues.apache.org/jira/browse/SPARK-24213
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> While running the code, PowerIterationClustering in spark ML throws exception.
> {code:scala}
> val data = spark.createDataFrame(Seq(
> (0, Array(1), Array(0.9)),
> (1, Array(2), Array(0.9)),
> (2, Array(3), Array(0.9)),
> (3, Array(4), Array(0.1)),
> (4, Array(5), Array(0.9))
> )).toDF("id", "neighbors", "similarities")
> val result = new PowerIterationClustering()
> .setK(2)
> .setMaxIter(10)
> .setInitMode("random")
> .transform(data)
> .select("id","prediction")
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`prediction`' given 
> input columns: [id, neighbors, similarities];;
> 'Project [id#215, 'prediction]
> +- AnalysisBarrier
>   +- Project [id#215, neighbors#216, similarities#217]
>  +- Join Inner, (id#215 = id#234)
> :- Project [_1#209 AS id#215, _2#210 AS neighbors#216, _3#211 AS 
> similarities#217]
> :  +- LocalRelation [_1#209, _2#210, _3#211]
> +- Project [cast(id#230L as int) AS id#234]
>+- LogicalRDD [id#230L, prediction#231], false
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24213) Power Iteration Clustering in SparkML throws exception, when the ID is IntType

2018-05-08 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24213:
---
Summary: Power Iteration Clustering in SparkML throws exception, when the 
ID is IntType  (was: Power Iteration Clustering in SparkML throws exception, 
when the ID in IntType)

> Power Iteration Clustering in SparkML throws exception, when the ID is IntType
> --
>
> Key: SPARK-24213
> URL: https://issues.apache.org/jira/browse/SPARK-24213
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> While running the code, PowerIterationClustering in spark ML throws exception.
> {code:scala}
> val data = spark.createDataFrame(Seq(
> (0, Array(1), Array(0.9)),
> (1, Array(2), Array(0.9)),
> (2, Array(3), Array(0.9)),
> (3, Array(4), Array(0.1)),
> (4, Array(5), Array(0.9))
> )).toDF("id", "neighbors", "similarities")
> val result = new PowerIterationClustering()
> .setK(2)
> .setMaxIter(10)
> .setInitMode("random")
> .transform(data)
> .select("id","prediction")
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`prediction`' given 
> input columns: [id, neighbors, similarities];;
> 'Project [id#215, 'prediction]
> +- AnalysisBarrier
>   +- Project [id#215, neighbors#216, similarities#217]
>  +- Join Inner, (id#215 = id#234)
> :- Project [_1#209 AS id#215, _2#210 AS neighbors#216, _3#211 AS 
> similarities#217]
> :  +- LocalRelation [_1#209, _2#210, _3#211]
> +- Project [cast(id#230L as int) AS id#234]
>+- LogicalRDD [id#230L, prediction#231], false
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24191) SparkML: Example code for Power Iteration Clustering

2018-05-08 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24191:
---
Fix Version/s: 2.4.0

> SparkML: Example code for Power Iteration Clustering 
> -
>
> Key: SPARK-24191
> URL: https://issues.apache.org/jira/browse/SPARK-24191
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Examples, ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> We need to provide an example code for Power Iteration Clustering in Spark ML 
> examples.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24213) Power Iteration Clustering in SparkML throws exception, when the ID in IntType

2018-05-08 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24213:
---
Environment: (was: {code:java}


{code}
 )

> Power Iteration Clustering in SparkML throws exception, when the ID in IntType
> --
>
> Key: SPARK-24213
> URL: https://issues.apache.org/jira/browse/SPARK-24213
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> While running the code, PowerIterationClustering in spark ML throws exception.
> {code:scala}
> val data = spark.createDataFrame(Seq(
> (0, Array(1), Array(0.9)),
> (1, Array(2), Array(0.9)),
> (2, Array(3), Array(0.9)),
> (3, Array(4), Array(0.1)),
> (4, Array(5), Array(0.9))
> )).toDF("id", "neighbors", "similarities")
> val result = new PowerIterationClustering()
> .setK(2)
> .setMaxIter(10)
> .setInitMode("random")
> .transform(data)
> .select("id","prediction")
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`prediction`' given 
> input columns: [id, neighbors, similarities];;
> 'Project [id#215, 'prediction]
> +- AnalysisBarrier
>   +- Project [id#215, neighbors#216, similarities#217]
>  +- Join Inner, (id#215 = id#234)
> :- Project [_1#209 AS id#215, _2#210 AS neighbors#216, _3#211 AS 
> similarities#217]
> :  +- LocalRelation [_1#209, _2#210, _3#211]
> +- Project [cast(id#230L as int) AS id#234]
>+- LogicalRDD [id#230L, prediction#231], false
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24213) Power Iteration Clustering in SparkML throws exception, when the ID in IntType

2018-05-08 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467943#comment-16467943
 ] 

spark_user commented on SPARK-24213:


Currently I am working on this issue.

> Power Iteration Clustering in SparkML throws exception, when the ID in IntType
> --
>
> Key: SPARK-24213
> URL: https://issues.apache.org/jira/browse/SPARK-24213
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
> Environment: {code:java}
> {code}
>  
>Reporter: spark_user
>Priority: Major
> Fix For: 2.4.0
>
>
> While running the code, PowerIterationClustering in spark ML throws exception.
> {code:scala}
> val data = spark.createDataFrame(Seq(
> (0, Array(1), Array(0.9)),
> (1, Array(2), Array(0.9)),
> (2, Array(3), Array(0.9)),
> (3, Array(4), Array(0.1)),
> (4, Array(5), Array(0.9))
> )).toDF("id", "neighbors", "similarities")
> val result = new PowerIterationClustering()
> .setK(2)
> .setMaxIter(10)
> .setInitMode("random")
> .transform(data)
> .select("id","prediction")
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`prediction`' given 
> input columns: [id, neighbors, similarities];;
> 'Project [id#215, 'prediction]
> +- AnalysisBarrier
>   +- Project [id#215, neighbors#216, similarities#217]
>  +- Join Inner, (id#215 = id#234)
> :- Project [_1#209 AS id#215, _2#210 AS neighbors#216, _3#211 AS 
> similarities#217]
> :  +- LocalRelation [_1#209, _2#210, _3#211]
> +- Project [cast(id#230L as int) AS id#234]
>+- LogicalRDD [id#230L, prediction#231], false
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24213) Power Iteration Clustering in SparkML throws exception, when the ID in IntType

2018-05-08 Thread spark_user (JIRA)
spark_user created SPARK-24213:
--

 Summary: Power Iteration Clustering in SparkML throws exception, 
when the ID in IntType
 Key: SPARK-24213
 URL: https://issues.apache.org/jira/browse/SPARK-24213
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.4.0
 Environment: {code:java}


{code}
 
Reporter: spark_user
 Fix For: 2.4.0


While running the code, PowerIterationClustering in spark ML throws exception.
{code:scala}
val data = spark.createDataFrame(Seq(
(0, Array(1), Array(0.9)),
(1, Array(2), Array(0.9)),
(2, Array(3), Array(0.9)),
(3, Array(4), Array(0.1)),
(4, Array(5), Array(0.9))
)).toDF("id", "neighbors", "similarities")

val result = new PowerIterationClustering()
.setK(2)
.setMaxIter(10)
.setInitMode("random")
.transform(data)
.select("id","prediction")
{code}


{code:java}
org.apache.spark.sql.AnalysisException: cannot resolve '`prediction`' given 
input columns: [id, neighbors, similarities];;
'Project [id#215, 'prediction]
+- AnalysisBarrier
  +- Project [id#215, neighbors#216, similarities#217]
 +- Join Inner, (id#215 = id#234)
:- Project [_1#209 AS id#215, _2#210 AS neighbors#216, _3#211 AS 
similarities#217]
:  +- LocalRelation [_1#209, _2#210, _3#211]
+- Project [cast(id#230L as int) AS id#234]
   +- LogicalRDD [id#230L, prediction#231], false

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)

{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24191) SparkML: Example code for Power Iteration Clustering

2018-05-05 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24191:
---
Description: 
We need to provide an example code for Power Iteration Clustering in Spark ML 
examples.

 

  was:
We need to provide an example code for Power Iteration Clustering, under 
examples/ of Spark ML.
  


> SparkML: Example code for Power Iteration Clustering 
> -
>
> Key: SPARK-24191
> URL: https://issues.apache.org/jira/browse/SPARK-24191
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Examples, ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
>
> We need to provide an example code for Power Iteration Clustering in Spark ML 
> examples.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24191) SparkML: Example code for Power Iteration Clustering

2018-05-05 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24191:
---
Description: 
We need to provide an example code for Power Iteration Clustering, under 
examples/ of Spark ML.
  

  was:
We need to provide an example of Power Iteration Clustering, under examples/ 
for Spark ML.
 


> SparkML: Example code for Power Iteration Clustering 
> -
>
> Key: SPARK-24191
> URL: https://issues.apache.org/jira/browse/SPARK-24191
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Examples, ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
>
> We need to provide an example code for Power Iteration Clustering, under 
> examples/ of Spark ML.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24191) SparkML: Example code for Power Iteration Clustering

2018-05-05 Thread spark_user (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

spark_user updated SPARK-24191:
---
Comment: was deleted

(was: I have created a PR https://github.com/apache/spark/pull/21248)

> SparkML: Example code for Power Iteration Clustering 
> -
>
> Key: SPARK-24191
> URL: https://issues.apache.org/jira/browse/SPARK-24191
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Examples, ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
>
> We need to provide an example of Power Iteration Clustering, under examples/ 
> for Spark ML.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24191) SparkML: Example code for Power Iteration Clustering

2018-05-05 Thread spark_user (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464905#comment-16464905
 ] 

spark_user commented on SPARK-24191:


I have created a PR https://github.com/apache/spark/pull/21248

> SparkML: Example code for Power Iteration Clustering 
> -
>
> Key: SPARK-24191
> URL: https://issues.apache.org/jira/browse/SPARK-24191
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Examples, ML
>Affects Versions: 2.4.0
>Reporter: spark_user
>Priority: Major
>
> We need to provide an example of Power Iteration Clustering, under examples/ 
> for Spark ML.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24191) SparkML: Example code for Power Iteration Clustering

2018-05-05 Thread spark_user (JIRA)
spark_user created SPARK-24191:
--

 Summary: SparkML: Example code for Power Iteration Clustering 
 Key: SPARK-24191
 URL: https://issues.apache.org/jira/browse/SPARK-24191
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Examples, ML
Affects Versions: 2.4.0
Reporter: spark_user


We need to provide an example of Power Iteration Clustering, under examples/ 
for Spark ML.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org