[jira] [Commented] (SPARK-15784) Add Power Iteration Clustering to spark.ml

Joseph K. Bradley (JIRA) Mon, 31 Oct 2016 11:15:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622915#comment-15622915
 ]


Joseph K. Bradley commented on SPARK-15784:
-------------------------------------------

[~wangmiao1981] Sorry for the slow response here.  I do want us to add PIC to 
spark.ml, but we should discuss the design before the PR.  Could you please 
close the PR for now but save the branch to re-open after discussion?

Let's have a design discussion first.

I agree that the big issue is that there isn't a clear way to make predictions 
on new data points.  In fact, I've never heard of people trying to do so.  Has 
anyone else?

Assuming that prediction is not meaningful for PIC, then I don't think the 
algorithm fits within the Pipeline framework, though it's debatable.  I see a 
few options:
* Put PIC in Pipelines as a Transformer, not an Estimator.  We would just need 
to document that it is a very expensive Transformer.
* Put PIC in spark.ml as a static method.  We may have to do this anyways to 
support all of spark.mllib's Statistics.
* Put PIC in GraphFrames (and push harder for GraphFrames to be merged back 
into Spark, which will include a much longer set of improvements).

My top choice is PIC as a Transformer.  What do you think?

CC [~yanboliang] [~sethah] [~mlnick] opinions?

> Add Power Iteration Clustering to spark.ml
> ------------------------------------------
>
>                 Key: SPARK-15784
>                 URL: https://issues.apache.org/jira/browse/SPARK-15784
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Xinh Huynh
>
> Adding this algorithm is required as part of SPARK-4591: Algorithm/model 
> parity for spark.ml. The review JIRA for clustering is SPARK-14380.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15784) Add Power Iteration Clustering to spark.ml

Reply via email to