Thanks for reply Wenchen, I am curious as what happens when RDD goes out of 
scope when it is not cached.

Nasrulla

From: Wenchen Fan <cloud0...@gmail.com>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris <nasrulla.k...@microsoft.com.invalid>
Cc: dev@spark.apache.org
Subject: Re: RDD object Out of scope.

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need 
to clean up the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris 
<nasrulla.k...@microsoft.com.invalid<mailto:nasrulla.k...@microsoft.com.invalid>>
 wrote:
HI Spark developers,

Can someone point out the code where RDD objects go out of scope ?. I found the 
contextcleaner<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7C81b54c9707834f297cc408d6ddf03381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940421061281654&sdata=ifd7sXnbwxIuzPXW2hIrhI%2BZN9kLccglY7W%2B%2BDJmbZI%3D&reserved=0>
 code in which only persisted RDDs are cleaned up in regular intervals if the 
RDD is registered to cleanup. I have not found where the destructor for RDD 
object is invoked. I am trying to understand when RDD cleanup happens when the 
RDD is not persisted.

Thanks in advance, appreciate your help.
Nasrulla

Reply via email to