Github user freeman-lab commented on the pull request:

    https://github.com/apache/spark/pull/2906#issuecomment-69134341
  
    Hi @yu-iskw and @rnowling , I've spent time reviewing the code and using it 
in both Python and Scala. Overall great work, terrific to see my little gist 
turned into something so refined and performant! =) I left lots of comments, 
most minor, though documenting the caching behavior seems quite important.
    
    The one significant addition I'd suggest is exposing another model output: 
a list of the centers at all nodes in the learned tree. This would be in 
addition to just the centers of the leaves, which is currently returned by 
`getCenters` (or `clusterCenters` in Python). Maybe call it `getTreeCenters`. 
It's basically given by `model.clusterTree.toSeq().map(_.center)`. But we 
should make sure it's sorted so that it can be indexed using the values from 
the merge list. In other words, if `Z` is the merge list, and row i indicates 
that `Z[i,0]` and `Z[i,1]` were merged, we want to be able to get the centers 
associated with those nodes by calling, for example, 
`model.treeCenters[Z[i,0]]` and `model.treeCenters[Z[i,1]]`. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to