Spark does not require that data sets fit in memory to begin with.
Yes, there's nothing inherently problematic about processing 1TB data
with a lot less than 1TB of cluster memory.
You probably want to read:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
On Tue, Aug 19, 2014 at 5:38 PM, Oleg Ruchovets wrote:
> Hi ,
>We have ~ 1TB of data to process , but our cluster doesn't have
> sufficient memory for such data set. ( we have 5-10 machine cluster).
> Is it possible to process 1TB data using ON DISK options using spark?
>
> If yes where can I read about the configuration for ON DISK executions.
>
>
> Thanks
> Oleg.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org