GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/1407

    SPARK-1215: Clustering: Index out of bounds error

    Bug fix for JIRA SPARK 1215: Clustering: Index out of bounds error
    
     https://issues.apache.org/jira/browse/SPARK-1215
    
    Solution: Print warning, and use duplicate cluster centers so that exactly 
k centers are returned.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1407.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1407
    
----
commit 97f2104bac2ab864c2a03f9a12a4b936557ae6d6
Author: Joseph Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-05-20T01:35:53Z

    added RDD::stratifiedSample method and associated unit tests in RDDSuite.  
Method is built off of RDD::takeSample method.

commit 91e83338820158b96cda492668dbed5fff33f19b
Author: Joseph Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-05-20T01:36:30Z

    added RDD::stratifiedSample method documentation

commit d6f8913b7e370a82138b9c623754b32a59c21cf6
Author: Joseph Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-05-21T04:58:12Z

    updated stratifiedSample to be more scalable, keeping data in RDDs instead 
of collecting to the driver

commit 21eead6a412508b536358f4e557e2fab23c9c696
Author: Joseph Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-05-23T19:51:07Z

    updated stratifiedSample to use selection-rejection to select samples on 
each partition in 1 pass, rather than pre-selecting indices

commit 91f4b19702bc58a77d28316674eace881a81165f
Author: Joseph K. Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-07-09T21:37:25Z

    merging with new spark

commit c0cb5f0d8c6104e3eb6cfa44820ba00b81bc7262
Author: Joseph K. Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-07-11T18:43:17Z

    merging with updated spark

commit 7d1b812a720cffdefe78ddb6e641930e7ae4975b
Author: Joseph K. Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-07-11T18:47:41Z

    removed my coding test updates

commit 18e5c8ad740871be92c6d7b73f5d35e25641a734
Author: Joseph K. Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-07-12T01:12:44Z

    Added check to LocalKMeans.scala: kMeansPlusPlus initialization to handle 
case with fewer distinct data points than clusters k.  Added two related unit 
tests to KMeansSuite.

commit e2bf638c6b3e8cc9cec3362caddb2305109d4c0a
Author: Joseph K. Bradley <joseph.kurata.brad...@gmail.com>
Date:   2014-07-14T17:52:33Z

    Merge remote-tracking branch 'upstream/master'

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to