looks good but how do I say that in Java
as far as I can see sc.parallelize (in Java)  has only one implementation
which takes a List - requiring an in memory representation

On Mon, Dec 8, 2014 at 12:06 PM, Daniel Darabos <
daniel.dara...@lynxanalytics.com> wrote:

> Hi,
> I think you have the right idea. I would not even worry about flatMap.
>
> val rdd = sc.parallelize(1 to 1000000, numSlices = 1000).map(x =>
> generateRandomObject(x))
>
> Then when you try to evaluate something on this RDD, it will happen
> partition-by-partition. So 1000 random objects will be generated at a time
> per executor thread.
>
> On Mon, Dec 8, 2014 at 8:05 PM, Steve Lewis <lordjoe2...@gmail.com> wrote:
>
>>  I have a function which generates a Java object and I want to explore
>> failures which only happen when processing large numbers of these object.
>> the real code is reading a many gigabyte file but in the test code I can
>> generate similar objects programmatically. I could create a small list,
>> parallelize it and then use flatmap to inflate it several times by a factor
>> of 1000 (remember I can hold a list of 1000 items in memory but not a
>> million)
>> Are there better ideas - remember I want to create more objects than can
>> be held in memory at once.
>>
>>
>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to