GitHub user dorx opened a pull request: https://github.com/apache/spark/pull/1520
[SPARK-2514] [mllib] Random RDD generator Utilities for generating random RDDs. RandomRDD and RandomVectorRDD are created instead of using `sc.parallelize(range:Range)` because `Range` objects in Scala can only have `size <= Int.MaxValue`. The object `RandomRDDGenerators` can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dorx/spark randomRDD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1520 ---- commit 888144416ced2b6d4c4839ac95b8a3feb2b3aba1 Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-12T01:02:01Z RandomRDDGenerator: initial design Looking for feedback on design decisions. Very rough draft and untested. commit 7cb0e406793db493cee72cb91ec02475c95c8de7 Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-12T01:15:56Z fix for data inconsistency commit 49ed20d9a30b0ba5d809974bbcf48cc76a45d68e Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-12T01:30:15Z alternative poisson distribution generator commit f46d928c4e3e71ced4ede9295ef645fb714c9a69 Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-19T02:13:58Z WIP commit df5bcffc320bab85f6c5925b244fe9885d6d0eb5 Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-21T07:47:07Z Merge branch 'generator' into randomRDD commit 92d6f1c3ca0f22371f7f0387b875ac16d5030ffb Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-21T07:48:12Z solution for Cloneable commit d56cacbde7a0550f53b59696ad7c7014c827f3f7 Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-22T01:23:19Z impl with RandomRDD commit bc90234c9639bfb3f4581af63cf4bf370c61e18b Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-22T03:37:40Z units passed. commit aec68eb167ac9f11c64d95c698009cbf8919bd4b Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-22T03:42:31Z newline commit 063ea0b48b769f7f8477ca2364f8e676f93c297e Author: Doris Xin <doris.s....@gmail.com> Date: 2014-07-22T03:43:57Z Merge branch 'master' into randomRDD ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---