[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

thunterdb Tue, 21 Feb 2017 14:48:59 -0800

Github user thunterdb commented on the issue:

    https://github.com/apache/spark/pull/15770
  
    You are right, I had forgotten that for this algorithm, the input is the 
edges, and the output is the label for each of the vertices.
    
    This is a tricky algorithm to put as a transformer, since it does not 
follow the usual convention that data should only be appended to the dataframe. 
I suggest we follow the same example as ALS the mllib implementation of PIC:
     - let's make it an estimator that returns a model: the model contains the 
labels for each of the points in a dataframe (the current output of transform)
     - the model's transform method now takes points with an id, and joins it 
with the models to append a column of labels. This is the same as ALS.
    
    If we do not follow this pattern, then the model selection algorithms are 
not going to work. What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

Reply via email to