[ https://issues.apache.org/jira/browse/SPARK-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mridul Muralidharan updated SPARK-23721: ---------------------------------------- Summary: Enhance BlockManagerId to include container's underlying host name (was: Use actual node's hostname for host and rack locality computation) > Enhance BlockManagerId to include container's underlying host name > ------------------------------------------------------------------ > > Key: SPARK-23721 > URL: https://issues.apache.org/jira/browse/SPARK-23721 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Mridul Muralidharan > Priority: Major > > In spark, host and rack locality computation is based on BlockManagerId's > hostname - which is the container's hostname. > When running in containerized environment's like kubernetes, docker support > in hadoop 3, mesos docker support, etc; the hostname reported by container is > not the actual 'host' the container is running on. > This results in spark getting affected in multiple ways. > h3. Suboptimal schedules > Due to host name mismatch between different containers on same physical host, > spark will treat all containers as running on own host. > Effectively, there is no host-locality schedule at all due to this. > In addition, depending on how sophisticated locality script is, it can also > lead to either suboptimal rack locality computation all the way to no > rack-locality schedule entirely. > Hence the performance degradation in scheduler can be significant - only > PROCESS_LOCAL schedules dont get affected. > h3. HDFS reads > This is closely related to "suboptimal schedules" above. > Block locations for hdfs files refer to the datanode hostnames - and not the > container's hostname. > This effectively results in spark ignoring hdfs data placement entirely for > scheduling tasks - resulting in very heavy cross-node/cross-rack data > movement. > h3. Speculative execution > Spark schedules speculative tasks on a different host - in order to minimize > the cost of node failures for expensive tasks. > This gets effectively disabled, resulting in speculative tasks potentially > running on the same actual host. > h3. Block replication > Similar to "speculative execution" above, block replication minimizes > potential cost of node loss by typically leveraging another host; which gets > effectively disabled in this case. > Solution for the above is to enhance BlockManagerId to also include the > node's actual hostname via 'nodeHostname' - which should be used for usecases > above instead of the container hostname ('host'). > When not relevant, nodeHostname == hostname : which should ensure all > existing functionality continues to work as expected with regressions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org