Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21119#discussion_r184839152
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
             return self.getOrDefault(self.keepLastCheckpoint)
     
     
    +@inherit_doc
    +class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
    +                               JavaMLReadable, JavaMLWritable):
    +    """
    +    .. note:: Experimental
    +    Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
    +    <a href=http://www.icml2010.org/papers/387.pdf>Lin and Cohen</a>. From 
the abstract:
    +    PIC finds a very low-dimensional embedding of a dataset using 
truncated power
    +    iteration on a normalized pair-wise similarity matrix of the data.
    +
    +    PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
    +    is a symmetric matrix whose entries are non-negative similarities 
between items.
    +    PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
    +    includes:
    +
    +     - :py:class:`idCol`: vertex ID
    +     - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol`
    +     - :py:class:`similaritiesCol`: non-negative weights (similarities) of 
edges between the
    +        vertex in :py:class:`idCol` and each neighbor in 
:py:class:`neighborsCol`
    +
    +    PIC returns a cluster assignment for each input vertex.  It appends a 
new column
    +    :py:class:`predictionCol` containing the cluster assignment in 
:py:class:`[0,k)` for
    +    each row (vertex).
    +
    +    Notes:
    +
    +     - [[PowerIterationClustering]] is a transformer with an expensive 
[[transform]] operation.
    +        Transform runs the iterative PIC algorithm to cluster the whole 
input dataset.
    +     - Input validation: This validates that similarities are non-negative 
but does NOT validate
    +        that the input matrix is symmetric.
    +
    +    @see <a href=http://en.wikipedia.org/wiki/Spectral_clustering>
    --- End diff --
    
    Use `.. seealso::`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to