Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194244595 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,213 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, JavaMLReadable, + JavaMLWritable): + """ + .. note:: Experimental + + Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by + <a href=http://www.icml2010.org/papers/387.pdf>Lin and Cohen</a>. From the abstract: + PIC finds a very low-dimensional embedding of a dataset using truncated power + iteration on a normalized pair-wise similarity matrix of the data. + + This class is not yet an Estimator/Transformer, use :py:func:`assignClusters` method + to run the PowerIterationClustering algorithm. + + .. seealso:: `Wikipedia on Spectral clustering \ + <http://en.wikipedia.org/wiki/Spectral_clustering>`_ + + >>> data = [((long)(1), (long)(0), 0.5), \ + ((long)(2), (long)(0), 0.5), \ + ((long)(2), (long)(1), 0.7), \ + ((long)(3), (long)(0), 0.5), \ + ((long)(3), (long)(1), 0.7), \ + ((long)(3), (long)(2), 0.9), \ + ((long)(4), (long)(0), 0.5), \ + ((long)(4), (long)(1), 0.7), \ + ((long)(4), (long)(2), 0.9), \ + ((long)(4), (long)(3), 1.1), \ + ((long)(5), (long)(0), 0.5), \ + ((long)(5), (long)(1), 0.7), \ + ((long)(5), (long)(2), 0.9), \ + ((long)(5), (long)(3), 1.1), \ + ((long)(5), (long)(4), 1.3)] + >>> df = spark.createDataFrame(data).toDF("src", "dst", "weight") + >>> pic = PowerIterationClustering() --- End diff -- If we only keep one example, we should use keyword args: ~~~python pic = PowerIterationClustering(k=2, maxIter=40, weightCol="weight") assignments = pic.assignClusters(df) ~~~
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org