Re: change default storage level
Thanks Shixiong! Your response helped me to understand the role of persist(). No persist() calls were required indeed. I solved my problem by setting spark.local.dir to allow more space for Spark temporary folder. It works automatically. I am seeing logs like this: Not enough space to cache rdd_0_1 in memory! Persisting partition rdd_0_1 to disk instead. Before I was getting: No space left on device On 9 July 2015 at 11:57, Shixiong Zhu wrote: > Spark won't store RDDs to memory unless you use a memory StorageLevel. By > default, your input and intermediate results won't be put into memory. You > can call persist if you want to avoid duplicate computation or reading. > E.g., > > val r1 = context.wholeTextFiles(...) > val r2 = r1.flatMap(s -> ...) > val r3 = r2.filter(...)... > r3.saveAsTextFile(...) > val r4 = r2.map(...)... > r4.saveAsTextFile(...) > > In the avoid example, r2 will be used twice. To speed up the computation, > you can call r2.persist(StorageLevel.MEMORY) to store r2 into memory. Then > r4 will use the data of r2 in memory directly. E.g., > > val r1 = context.wholeTextFiles(...) > val r2 = r1.flatMap(s -> ...) > r2.persist(StorageLevel.MEMORY) > val r3 = r2.filter(...)... > r3.saveAsTextFile(...) > val r4 = r2.map(...)... > r4.saveAsTextFile(...) > > See > http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence > > > Best Regards, > Shixiong Zhu > > 2015-07-09 22:09 GMT+08:00 Michal Čizmazia : > >> Is there a way how to change the default storage level? >> >> If not, how can I properly change the storage level wherever necessary, >> if my input and intermediate results do not fit into memory? >> >> In this example: >> >> context.wholeTextFiles(...) >> .flatMap(s -> ...) >> .flatMap(s -> ...) >> >> Does persist() need to be called after every transformation? >> >> context.wholeTextFiles(...) >> .persist(StorageLevel.MEMORY_AND_DISK) >> .flatMap(s -> ...) >> .persist(StorageLevel.MEMORY_AND_DISK) >> .flatMap(s -> ...) >> .persist(StorageLevel.MEMORY_AND_DISK) >> >> Thanks! >> >> >
Re: change default storage level
Spark won't store RDDs to memory unless you use a memory StorageLevel. By default, your input and intermediate results won't be put into memory. You can call persist if you want to avoid duplicate computation or reading. E.g., val r1 = context.wholeTextFiles(...) val r2 = r1.flatMap(s -> ...) val r3 = r2.filter(...)... r3.saveAsTextFile(...) val r4 = r2.map(...)... r4.saveAsTextFile(...) In the avoid example, r2 will be used twice. To speed up the computation, you can call r2.persist(StorageLevel.MEMORY) to store r2 into memory. Then r4 will use the data of r2 in memory directly. E.g., val r1 = context.wholeTextFiles(...) val r2 = r1.flatMap(s -> ...) r2.persist(StorageLevel.MEMORY) val r3 = r2.filter(...)... r3.saveAsTextFile(...) val r4 = r2.map(...)... r4.saveAsTextFile(...) See http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence Best Regards, Shixiong Zhu 2015-07-09 22:09 GMT+08:00 Michal Čizmazia : > Is there a way how to change the default storage level? > > If not, how can I properly change the storage level wherever necessary, if > my input and intermediate results do not fit into memory? > > In this example: > > context.wholeTextFiles(...) > .flatMap(s -> ...) > .flatMap(s -> ...) > > Does persist() need to be called after every transformation? > > context.wholeTextFiles(...) > .persist(StorageLevel.MEMORY_AND_DISK) > .flatMap(s -> ...) > .persist(StorageLevel.MEMORY_AND_DISK) > .flatMap(s -> ...) > .persist(StorageLevel.MEMORY_AND_DISK) > > Thanks! > >
change default storage level
Is there a way how to change the default storage level? If not, how can I properly change the storage level wherever necessary, if my input and intermediate results do not fit into memory? In this example: context.wholeTextFiles(...) .flatMap(s -> ...) .flatMap(s -> ...) Does persist() need to be called after every transformation? context.wholeTextFiles(...) .persist(StorageLevel.MEMORY_AND_DISK) .flatMap(s -> ...) .persist(StorageLevel.MEMORY_AND_DISK) .flatMap(s -> ...) .persist(StorageLevel.MEMORY_AND_DISK) Thanks!