[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

tdas Wed, 12 Mar 2014 20:59:24 -0700

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/126#issuecomment-37498313
  
    @yaoshengzhe 
    This is only safe, best-effort attempt to clean metadata, so not guarantee 
is being provided here. All we are trying to do for long running Spark 
computations (say, Spark Streaming program that runs 24/7) there is _something_ 
that cleans up in a safe way. 
    
    I am taking care to make sure call to the finalize() is cheap, just a 
insert to a queue which does not block (inserts in LinkedBlockingQueue without 
capacity constraint does not block for all practical purposes). 
    
    Regarding phantom references, from what I understand, does not provide any 
more guarantee on when garbage collection is performed than the current method. 
It just gets called after finalize is done on objects. The main source of 
uncertainty comes directly from the garbage collection step, which cannot be 
avoided by any method. Rather, using any sort of weak or phantom reference 
queue requires _every_ RDD to be wrapped by WeakReference or PhantomReference. 
That is seems to me to be an unnecessary complexity with little added benefit.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

Reply via email to