Github user maropu commented on the issue: https://github.com/apache/spark/pull/14039 @markhamstra Thanks for the comment. I think the reuse of fragments highly depends on user's queries, catalyst optimizer, cluster resources... Reusing `ShuffledRowRDD` shuffle data in a single job is a good idea though, it seems difficult to stay the data in multiple jobs because spark cannot know when the data should be garbaged-collected and it possibly eats much disk space. I think caching mechanism is a better idea to reuse fragments in multiple jobs. Or, do u have any smart/concrete idea to reuse the shuffle data?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org