[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

vanzin Mon, 08 Aug 2016 17:10:05 -0700

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/10846
  
    I'm not saying we should fix just one of them. I'm saying we should treat 
them as separate issues. I am a little concerned about the workaround for the 
soft refs, for example, and that doesn't need to block the fix for the other 
issue.
    
    As for the soft ref issue, I'm not sure I understand what you mean: "Due to 
softRef they reach till GC threshold and gets cleared up". Are you saying soft 
refs don't get cleaned up when the HadoopRDD instances are collected? Or that 
they take longer? Can you clarify what you mean?
    
    If there's a problem with using soft refs here, then maybe a more explicit 
collection approach (e.g. a new method in `ContextCleaner` to track these) 
could be a better work around. But that assumes that your HadoopRDD instances 
are being collected, and if they're not, maybe *that's* the problem.
    
    Or if the caching is not bringing any benefits, maybe just remove the cache 
altogether. But with the little information you have provided, it's hard to 
know what's the case here.
    
    So, as you see, it's better to keep these two as separate issues.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

Reply via email to