Github user staple commented on the pull request: https://github.com/apache/spark/pull/2362#issuecomment-55303441 Hi, I implemented this per discussion here https://github.com/apache/spark/pull/2347#issuecomment-55181535, assuming I understood the comment correctly. The context is that we are supposed to log a warning when running an iterative learning algorithm on an uncached rdd. What originally led me to identify SPARK-3488 is that if the deserialized python rdds are always uncached, a warning will always be logged. Obviously a meaningful performance difference would trump the implementation of this warning message, and I haven't measured performance - just discussed options in the above referenced pull request. But by way of comparison, is there any significant difference in memory pressure between caching a LabeledPoint rdd deserialized from python and caching a LabeledPoint rdd crated natively in scala (which is the typical use case with a scala rather than python client)? If I should do some performance testing, are there any examples of tests and infrastructure you'd suggest as a starting point? 'none' means the rdd is not cached within the python -> scala mllib interface, where previously it was cached. The learning algorithms for which rdds are no longer cached implement their own caching internally.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org