[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

mengxr Sat, 09 Jun 2018 21:15:01 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21513#discussion_r194244595
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -1156,6 +1157,213 @@ def getKeepLastCheckpoint(self):
             return self.getOrDefault(self.keepLastCheckpoint)
     
     
    +@inherit_doc
    +class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, 
JavaMLReadable,
    +                               JavaMLWritable):
    +    """
    +    .. note:: Experimental
    +
    +    Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
    +    <a href=http://www.icml2010.org/papers/387.pdf>Lin and Cohen</a>. From 
the abstract:
    +    PIC finds a very low-dimensional embedding of a dataset using 
truncated power
    +    iteration on a normalized pair-wise similarity matrix of the data.
    +
    +    This class is not yet an Estimator/Transformer, use 
:py:func:`assignClusters` method
    +    to run the PowerIterationClustering algorithm.
    +
    +    .. seealso:: `Wikipedia on Spectral clustering \
    +    <http://en.wikipedia.org/wiki/Spectral_clustering>`_
    +
    +   >>> data = [((long)(1), (long)(0), 0.5), \
    +               ((long)(2), (long)(0), 0.5), \
    +               ((long)(2), (long)(1), 0.7), \
    +               ((long)(3), (long)(0), 0.5), \
    +               ((long)(3), (long)(1), 0.7), \
    +               ((long)(3), (long)(2), 0.9), \
    +               ((long)(4), (long)(0), 0.5), \
    +               ((long)(4), (long)(1), 0.7), \
    +               ((long)(4), (long)(2), 0.9), \
    +               ((long)(4), (long)(3), 1.1), \
    +               ((long)(5), (long)(0), 0.5), \
    +               ((long)(5), (long)(1), 0.7), \
    +               ((long)(5), (long)(2), 0.9), \
    +               ((long)(5), (long)(3), 1.1), \
    +               ((long)(5), (long)(4), 1.3)]
    +    >>> df = spark.createDataFrame(data).toDF("src", "dst", "weight")
    +    >>> pic = PowerIterationClustering()
    --- End diff --
    
    If we only keep one example, we should use keyword args:
    
    ~~~python
    pic = PowerIterationClustering(k=2, maxIter=40, weightCol="weight")
    assignments = pic.assignClusters(df)
    ~~~



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

Reply via email to