Github user arucard21 commented on the issue: https://github.com/apache/spark/pull/20731 I could try to create a simplified version of this image with just a few of these relations. But since that's still an image it would still be hard to update. So I can remove the image but try to describe the context a bit more in text. We already mention the possibility of using Hadoop modules for Cluster Management and Distributed Storage. We can include how there are other options available to provide this functionality. That should provide a more comprehensive overview of the important entities that Spark interacts with. (available APIs + modules to extend functionality + third-party modules for specific functionality = all you need to run Spark) I'm not sure if this is sufficient added value for the documentation (or if you think this additional information is even needed). If not, let me know and I can just close this PR.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org