Jingwei (Sophie) Zhang created SPARK-46343:
----------------------------------------------
Summary: Spark cannot support Docker bridge network in YARN
Key: SPARK-46343
URL: https://issues.apache.org/jira/browse/SPARK-46343
Project: Spark
Issue Type: Bug
Components: YARN
Affects Versions: 4.0.0, 3.5.1
Environment: OS: Ubuntu 22.04.2 LTS
JDK Version: 1.8
Hadoop Version: 3.3.6
Spark Version: 3.5.1
Reporter: Jingwei (Sophie) Zhang
Hello Spark team,
I recently found a possible bug in Spark YarnAllocator.
Basically when I try to run Spark applications on YARN with Docker bridge
network, the job failed with binding address error at Executor side.
I believe it is caused by the YarnAllocator implementation in Spark, the
executor is trying to bind the hostname of the NodeManager instead of the
hostname of the container. In host network it's fine but bridge network will
break.
!image-2023-12-09-14-28-28-147.png|width=659,height=477!
For more details please checkout [RCA - Spark + YARN Docker Bridge
Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].
It looks like YARN Container API does not contain the container hostname
related information, which mean to solve this issue, we may also need to make
changes at Hadoop YARN side?
Please let me know if you have any questions, many thanks!
---
Best Regards,
Jingwei Zhang
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]