Github user squito commented on the pull request: https://github.com/apache/spark/pull/5572#issuecomment-105257051 @viirya @cloud-fan good point, I hadn't thought about multiple tasks on one executor that are all pulling the same partition of `rdd2`. Still, I'm very worried about having the extra local caching, if we don't have an effective way of undoing, because I think it will be very confusing to have these extra blocks stuck in the cache. I agree that "idea 1" is not as general as a solution, but I was hoping it was simple enough to fit your narrow need here. In any case, this is just my opinion -- I'm not adamantly against this, but I would really like to get some other reviewers that weigh in before we would merge in those changes.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org