[ 
https://issues.apache.org/jira/browse/SPARK-46343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei (Sophie) Zhang updated SPARK-46343:
-------------------------------------------
    Description: 
Hello Spark team,

I recently found a possible bug in Spark YarnAllocator.

Basically when I try to run Spark applications on YARN with Docker bridge 
network, the job failed with binding address error at Executor side.

I believe it is caused by the YarnAllocator implementation in Spark, the 
executor is trying to bind the hostname of the NodeManager instead of the 
hostname of the container. In host network it's fine but bridge network will 
break.

For more details please checkout [RCA - Spark + YARN Docker Bridge 
Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].

It looks like YARN Container API does not contain the container hostname 
related information, which mean to solve this issue, we may also need to make 
changes at Hadoop YARN side?

 

Please let me know if you have any questions, many thanks!

—

Best Regards,

Jingwei Zhang

  was:
Hello Spark team,

I recently found a possible bug in Spark YarnAllocator.

Basically when I try to run Spark applications on YARN with Docker bridge 
network, the job failed with binding address error at Executor side.

I believe it is caused by the YarnAllocator implementation in Spark, the 
executor is trying to bind the hostname of the NodeManager instead of the 
hostname of the container. In host network it's fine but bridge network will 
break.

!image-2023-12-09-14-28-28-147.png|width=659,height=477!

For more details please checkout [RCA - Spark + YARN Docker Bridge 
Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].

It looks like YARN Container API does not contain the container hostname 
related information, which mean to solve this issue, we may also need to make 
changes at Hadoop YARN side?

 

Please let me know if you have any questions, many thanks!

---

Best Regards,

Jingwei Zhang


> Spark cannot support Docker bridge network in YARN
> --------------------------------------------------
>
>                 Key: SPARK-46343
>                 URL: https://issues.apache.org/jira/browse/SPARK-46343
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 4.0.0, 3.5.1
>         Environment: OS: Ubuntu 22.04.2 LTS
> JDK Version: 1.8
> Hadoop Version: 3.3.6
> Spark Version: 3.5.1
>            Reporter: Jingwei (Sophie) Zhang
>            Priority: Major
>         Attachments: Screenshot 2023-05-16 221916.png
>
>
> Hello Spark team,
> I recently found a possible bug in Spark YarnAllocator.
> Basically when I try to run Spark applications on YARN with Docker bridge 
> network, the job failed with binding address error at Executor side.
> I believe it is caused by the YarnAllocator implementation in Spark, the 
> executor is trying to bind the hostname of the NodeManager instead of the 
> hostname of the container. In host network it's fine but bridge network will 
> break.
> For more details please checkout [RCA - Spark + YARN Docker Bridge 
> Network|https://github.com/EC528-Fall-2023/Kata-Containers-for-SPARK/blob/main/docs/troubleshoot/rca-docker-bridge-net.md].
> It looks like YARN Container API does not contain the container hostname 
> related information, which mean to solve this issue, we may also need to make 
> changes at Hadoop YARN side?
>  
> Please let me know if you have any questions, many thanks!
> —
> Best Regards,
> Jingwei Zhang



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to