[ https://issues.apache.org/jira/browse/SPARK-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470701#comment-16470701 ]
Joseph K. Bradley commented on SPARK-15784: ------------------------------------------- So... we originally agreed to make this a Transformer (in the discussion above), but [SPARK-24213] and [SPARK-24217] brought up the issue that we can't have this be a Row -> Row Transformer: * The input data need to have one graph edge pair (i,j) for each edge, not duplicated ones (i,j) and (j,i). * That means that there could be between 0 and numVertices/2 vertices which do not have corresponding Rows. This greatly lessens the value of presenting this as a Transformer. I recommend we rewrite the API before Spark 2.4 and make PIC a utility in spark.ml.stat. We can have it inherit from Params but not make it a Transformer. How does this sound? > Add Power Iteration Clustering to spark.ml > ------------------------------------------ > > Key: SPARK-15784 > URL: https://issues.apache.org/jira/browse/SPARK-15784 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Xinh Huynh > Assignee: Miao Wang > Priority: Major > > Adding this algorithm is required as part of SPARK-4591: Algorithm/model > parity for spark.ml. The review JIRA for clustering is SPARK-14380. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org