Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3913#issuecomment-71553196 Just not sure how overall useful it would be. For RDD data, it might be slightly misleading here because of things like serialization in-memory. For broadcast objects in the shell, it would only work in the scala shell though because of the way that serialization works in Python. I'm also not totally sure overall how accurate our memory estimation is and it may get less so if we add smarter caching for SchemaRDD's. Anyways, what would be helpful, could you walk through an example with a case class or something and show how accurate it is? That would be useful to better understand. One thing we could do that would be more isolated is have a function in SparkContext called `estimateSizeOf(object: Any)`, so that at least we don't expose the class location and names as API's.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org