[
https://issues.apache.org/jira/browse/SPARK-46343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jingwei (Sophie) Zhang updated SPARK-46343:
-------------------------------------------
Description:
Hello Spark team,
I recently found a possible bug in Spark YarnAllocator.
Basically when I try to run Spark applications on YARN with Docker bridge
network, the job failed with binding address error at Executor side.
I believe it is caused by the YarnAllocator implementation in Spark, the
executor is trying to bind the hostname of the NodeManager instead of the
hostname of the container. In host network it's fine but bridge network will
break.
For more details please checkout [RCA - Spark + YARN Docker Bridge
Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].
It looks like YARN Container API does not contain the container hostname
related information, which mean to solve this issue, we may also need to make
changes at Hadoop YARN side?
Please let me know if you have any questions, many thanks!
—
Best Regards,
Jingwei Zhang
was:
Hello Spark team,
I recently found a possible bug in Spark YarnAllocator.
Basically when I try to run Spark applications on YARN with Docker bridge
network, the job failed with binding address error at Executor side.
I believe it is caused by the YarnAllocator implementation in Spark, the
executor is trying to bind the hostname of the NodeManager instead of the
hostname of the container. In host network it's fine but bridge network will
break.
!image-2023-12-09-14-28-28-147.png|width=659,height=477!
For more details please checkout [RCA - Spark + YARN Docker Bridge
Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].
It looks like YARN Container API does not contain the container hostname
related information, which mean to solve this issue, we may also need to make
changes at Hadoop YARN side?
Please let me know if you have any questions, many thanks!
---
Best Regards,
Jingwei Zhang
> Spark cannot support Docker bridge network in YARN
> --------------------------------------------------
>
> Key: SPARK-46343
> URL: https://issues.apache.org/jira/browse/SPARK-46343
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 4.0.0, 3.5.1
> Environment: OS: Ubuntu 22.04.2 LTS
> JDK Version: 1.8
> Hadoop Version: 3.3.6
> Spark Version: 3.5.1
> Reporter: Jingwei (Sophie) Zhang
> Priority: Major
> Attachments: Screenshot 2023-05-16 221916.png
>
>
> Hello Spark team,
> I recently found a possible bug in Spark YarnAllocator.
> Basically when I try to run Spark applications on YARN with Docker bridge
> network, the job failed with binding address error at Executor side.
> I believe it is caused by the YarnAllocator implementation in Spark, the
> executor is trying to bind the hostname of the NodeManager instead of the
> hostname of the container. In host network it's fine but bridge network will
> break.
> For more details please checkout [RCA - Spark + YARN Docker Bridge
> Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].
> It looks like YARN Container API does not contain the container hostname
> related information, which mean to solve this issue, we may also need to make
> changes at Hadoop YARN side?
>
> Please let me know if you have any questions, many thanks!
> —
> Best Regards,
> Jingwei Zhang
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]