assume I don't care about values which may be created in a later map - in
scala I can say
val rdd = sc.parallelize(1 to 10, numSlices = 1000)
but in Java JavaSparkContext can only paralellize a List - limited to
Integer,MAX_VALUE elements and required to exist in memory - the best I can
Steve, Something like this will do I think = sc.parallelize(1 to 1000,
1000).flatMap(x = 1 to 10)
the above will launch 1000 tasks (maps), with each task creating 10^5
numbers (total of 100 million elements)
On Mon, Dec 8, 2014 at 6:17 PM, Steve Lewis lordjoe2...@gmail.com wrote:
assume