[jira] [Commented] (SPARK-15784) Add Power Iteration Clustering to spark.ml

Joseph K. Bradley (JIRA) Thu, 10 May 2018 09:45:22 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470701#comment-16470701
 ]


Joseph K. Bradley commented on SPARK-15784:
-------------------------------------------

So... we originally agreed to make this a Transformer (in the discussion 
above), but [SPARK-24213] and [SPARK-24217] brought up the issue that we can't 
have this be a Row -> Row Transformer:
* The input data need to have one graph edge pair (i,j) for each edge, not 
duplicated ones (i,j) and (j,i).
* That means that there could be between 0 and numVertices/2 vertices which do 
not have corresponding Rows.

This greatly lessens the value of presenting this as a Transformer.  I 
recommend we rewrite the API before Spark 2.4 and make PIC a utility in 
spark.ml.stat.  We can have it inherit from Params but not make it a 
Transformer.

How does this sound?

> Add Power Iteration Clustering to spark.ml
> ------------------------------------------
>
>                 Key: SPARK-15784
>                 URL: https://issues.apache.org/jira/browse/SPARK-15784
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Xinh Huynh
>            Assignee: Miao Wang
>            Priority: Major
>
> Adding this algorithm is required as part of SPARK-4591: Algorithm/model 
> parity for spark.ml. The review JIRA for clustering is SPARK-14380.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15784) Add Power Iteration Clustering to spark.ml

Reply via email to