Re: spark on disk executions

2014-08-19 Thread Sean Owen
Spark does not require that data sets fit in memory to begin with.
Yes, there's nothing inherently problematic about processing 1TB data
with a lot less than 1TB of cluster memory.

You probably want to read:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence



On Tue, Aug 19, 2014 at 5:38 PM, Oleg Ruchovets  wrote:
> Hi ,
>We have ~ 1TB of data to process , but our cluster doesn't have
> sufficient memory for such data set. ( we have 5-10 machine cluster).
> Is it possible to process  1TB data using ON DISK options using spark?
>
> If yes where can I read about the configuration for ON DISK executions.
>
>
> Thanks
> Oleg.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark on disk executions

2014-08-19 Thread Oleg Ruchovets
Hi ,
   We have ~ 1TB of data to process , but our cluster doesn't have
sufficient memory for such data set. ( we have 5-10 machine cluster).
Is it possible to process  1TB data using ON DISK options using spark?

If yes where can I read about the configuration for ON DISK executions.


Thanks
Oleg.