Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/15770
  
    Joseph K. Bradley added a comment - 31/Oct/16 18:14
    
    Miao Wang Sorry for the slow response here. I do want us to add PIC to 
spark.ml, but we should discuss the design before the PR. Could you please 
close the PR for now but save the branch to re-open after discussion?
    
    Let's have a design discussion first.
    
    I agree that the big issue is that there isn't a clear way to make 
predictions on new data points. In fact, I've never heard of people trying to 
do so. Has anyone else?
    
    Assuming that prediction is not meaningful for PIC, then I don't think the 
algorithm fits within the Pipeline framework, though it's debatable. I see a 
few options:
    
        Put PIC in Pipelines as a Transformer, not an Estimator. We would just 
need to document that it is a very expensive Transformer.
        Put PIC in spark.ml as a static method. We may have to do this anyways 
to support all of spark.mllib's Statistics.
        Put PIC in GraphFrames (and push harder for GraphFrames to be merged 
back into Spark, which will include a much longer set of improvements).
    
    My top choice is PIC as a Transformer. What do you think?
    
    CC Yanbo Liang Seth Hendrickson Nick Pentreath opinions?
    sethah Seth Hendrickson added a comment - 31/Oct/16 22:40
    
    This seems like it fits the framework of a feature transformer. We could 
generate a real-valued feature column using PIC algorithm where the values are 
just the components of the pseudo-eigenvector. Alternatively we could pipeline 
a KMeans clustering on the end, but I think it makes more sense to let users do 
that themselves - but that's up for debate.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to