[ 
https://issues.apache.org/jira/browse/SPARK-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-23721:
----------------------------------------
    Summary: Enhance BlockManagerId to include container's underlying host name 
 (was: Use actual node's hostname for host and rack locality computation)

> Enhance BlockManagerId to include container's underlying host name
> ------------------------------------------------------------------
>
>                 Key: SPARK-23721
>                 URL: https://issues.apache.org/jira/browse/SPARK-23721
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Mridul Muralidharan
>            Priority: Major
>
> In spark, host and rack locality computation is based on BlockManagerId's 
> hostname - which is the container's hostname.
> When running in containerized environment's like kubernetes, docker support 
> in hadoop 3, mesos docker support, etc; the hostname reported by container is 
> not the actual 'host' the container is running on.
> This results in spark getting affected in multiple ways.
> h3. Suboptimal schedules
> Due to host name mismatch between different containers on same physical host, 
> spark will treat all containers as running on own host.
> Effectively, there is no host-locality schedule at all due to this.
> In addition, depending on how sophisticated locality script is, it can also 
> lead to either suboptimal rack locality computation all the way to no 
> rack-locality schedule entirely.
> Hence the performance degradation in scheduler can be significant - only 
> PROCESS_LOCAL schedules dont get affected.
> h3. HDFS reads
> This is closely related to "suboptimal schedules" above.
> Block locations for hdfs files refer to the datanode hostnames - and not the 
> container's hostname.
> This effectively results in spark ignoring hdfs data placement entirely for 
> scheduling tasks - resulting in very heavy cross-node/cross-rack data 
> movement.
> h3. Speculative execution
> Spark schedules speculative tasks on a different host - in order to minimize 
> the cost of node failures for expensive tasks.
> This gets effectively disabled, resulting in speculative tasks potentially 
> running on the same actual host.
> h3. Block replication
> Similar to "speculative execution" above, block replication minimizes 
> potential cost of node loss by typically leveraging another host; which gets 
> effectively disabled in this case.
> Solution for the above is to enhance BlockManagerId to also include the 
> node's actual hostname via 'nodeHostname' - which should be used for usecases 
> above instead of the container hostname ('host').
> When not relevant, nodeHostname == hostname : which should ensure all 
> existing functionality continues to work as expected with regressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to