Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Joseph K. Bradley added a comment - 31/Oct/16 18:14 Miao Wang Sorry for the slow response here. I do want us to add PIC to spark.ml, but we should discuss the design before the PR. Could you please close the PR for now but save the branch to re-open after discussion? Let's have a design discussion first. I agree that the big issue is that there isn't a clear way to make predictions on new data points. In fact, I've never heard of people trying to do so. Has anyone else? Assuming that prediction is not meaningful for PIC, then I don't think the algorithm fits within the Pipeline framework, though it's debatable. I see a few options: Put PIC in Pipelines as a Transformer, not an Estimator. We would just need to document that it is a very expensive Transformer. Put PIC in spark.ml as a static method. We may have to do this anyways to support all of spark.mllib's Statistics. Put PIC in GraphFrames (and push harder for GraphFrames to be merged back into Spark, which will include a much longer set of improvements). My top choice is PIC as a Transformer. What do you think? CC Yanbo Liang Seth Hendrickson Nick Pentreath opinions? sethah Seth Hendrickson added a comment - 31/Oct/16 22:40 This seems like it fits the framework of a feature transformer. We could generate a real-valued feature column using PIC algorithm where the values are just the components of the pseudo-eigenvector. Alternatively we could pipeline a KMeans clustering on the end, but I think it makes more sense to let users do that themselves - but that's up for debate.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org