It would be nice if the RDD cache() method incorporate a depth information.

That is,

 

void test()
{

JavaRDD<.> rdd = .;

 

rdd.cache();  // to depth 1. actual caching happens.

rdd.cache();  // to depth 2. Nop as long as the storage level is the same.
Else, exception.

.

rdd.uncache();  // to depth 1. Nop.

rdd.uncache();  // to depth 0. Actual unpersist happens.

}

 

This can be useful when writing code in modular way.

When a function receives an rdd as an argument, it doesn't necessarily know
the cache status of the rdd.

But it could want to cache the rdd, since it will use the rdd multiple
times.

But with the current RDD API, it cannot determine whether it should
unpersist it or leave it alone (so that caller can continue to use that rdd
without rebuilding).

 

Thanks.

 

Reply via email to