Github user staple commented on the pull request: https://github.com/apache/spark/pull/2412#issuecomment-56865408 @davies It looks like in your #2378 you already disabled caching for NaiveBayes and DecisionTree. The only difference from this patch is that I disabled caching for ALS as well. We discussed this a bit here: https://github.com/apache/spark/pull/2378#discussion_r17686208. I filed this ticket as a follow up of the work on uncached input warnings (https://github.com/apache/spark/pull/2347). The warnings are only supposed to be printed if the input data is accessed repeatedly on many iterations during learning. That's not the case with ALS, so a warning shouldn't be printed there. But I can see there's a case for caching because the input data is accessed twice when constructing an intermediate representation of the data. I don't have a strong preference on whether we should or should not cache in python for the ALS learner. If you are fine with continuing to cache in python for ALS, then there's no more work to be done for this ticket, SPARK-3550.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org