[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

pwendell Mon, 26 Jan 2015 14:36:06 -0800

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3913#issuecomment-71553196
  
    Just not sure how overall useful it would be. For RDD data, it might be 
slightly misleading here because of things like serialization in-memory. For 
broadcast objects in the shell, it would only work in the scala shell though 
because of the way that serialization works in Python.
    
    I'm also not totally sure overall how accurate our memory estimation is and 
it may get less so if we add smarter caching for SchemaRDD's. Anyways, what 
would be helpful, could you walk through an example with a case class or 
something and show how accurate it is? That would be useful to better 
understand.
    
    One thing we could do that would be more isolated is have a function in 
SparkContext called `estimateSizeOf(object: Any)`, so that at least we don't 
expose the class location and names as API's.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

Reply via email to