[ https://issues.apache.org/jira/browse/SPARK-28778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-28778: ---------------------------------- Fix Version/s: 2.4.5 > Shuffle jobs fail due to incorrect advertised address when running in virtual > network > ------------------------------------------------------------------------------------- > > Key: SPARK-28778 > URL: https://issues.apache.org/jira/browse/SPARK-28778 > Project: Spark > Issue Type: Bug > Components: Mesos > Affects Versions: 2.2.3, 2.3.0, 2.4.3 > Reporter: Anton Kirillov > Assignee: Anton Kirillov > Priority: Major > Labels: Mesos > Fix For: 2.4.5, 3.0.0 > > > When shuffle jobs are launched by Mesos in a virtual network, Mesos scheduler > sets executor {{--hostname}} parameter to {{0.0.0.0}} in the case when > {{spark.mesos.network.name}} is provided. This makes executors use > {{0.0.0.0}} as their advertised address and, in the presence of shuffle, > executors fail to fetch shuffle blocks from each other using {{0.0.0.0}} as > the origin. When a virtual network is used the hostname or IP address is not > known upfront and assigned to a container at its start time so the executor > process needs to advertise the correct dynamically assigned address to be > reachable by other executors. > h3. > The bug described above prevents Mesos users from running any jobs which > involve shuffle due to the inability of executors to fetch shuffle blocks > because of incorrect advertised address when virtual network is used. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org