[ 
https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432163#comment-15432163
 ] 

Liam Fisk commented on SPARK-11638:
-----------------------------------

Just mirroring what I said on SPARK-4563 about the lack of support for bridged 
networking:

{quote}
It also makes life difficult for OSX users. Docker for Mac uses xhyve to 
virtualize the docker engine 
(https://docs.docker.com/engine/installation/mac/), and thus `--net=host` binds 
to the VM's network instead of the true OSX host. The SPARK_LOCAL_IP ends up as 
172.17.0.2, which is not externally contactable.

The end result is OSX users cannot containerize Spark if Spark needs to contact 
a mesos cluster.
{quote}

While you are unlikely to have Spark running on an OSX machine in production, 
the development experience is a bit painful if you have to run a separate VM 
with public networking.

> Run Spark on Mesos with bridge networking
> -----------------------------------------
>
>                 Key: SPARK-11638
>                 URL: https://issues.apache.org/jira/browse/SPARK-11638
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos, Spark Core
>    Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>            Reporter: Radoslaw Gruchalski
>         Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 
> 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, 
> {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and 
> {{spark.replClassServer.advertisedPort}} settings to enable running Spark in 
> Mesos on Docker with Bridge networking. Provides patches for Akka Remote to 
> enable Spark driver advertisement using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container 
> and have the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: 
> https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos 
> in the container has to register for offers using the IP address of the 
> Agent. Offers are sent by Mesos Master to the Docker container running on a 
> different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} 
> would advertise itself using the IP address of the container, something like 
> {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a 
> different host, it's a different machine. Mesos 0.24.0 introduced two new 
> properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and 
> {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's 
> address to register for offers. This was provided mainly for running Mesos in 
> Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its 
> services on ports different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking 
> mode. Assuming a port {{6666}} for the {{spark.driver.port}}, {{6677}} for 
> the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and 
> {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to 
> Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the 
> container ports. Starting the executors from such container results in 
> executors not being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} 
> transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port 
> different to what it bound to. The settings discussed are here: 
> https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
>  These do not exist in Akka {{2.3.x}}. Spark driver will always advertise 
> port {{6666}} as this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark 
> Master and handed over to executors. These always contain the port number 
> used by the Master to find the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified 
> using Spark configuration ( {{-Dspark...port}} ). However, they are limited 
> in the same way as the {{spark.driver.port}}; in the above example, an 
> executor should not contact the file server on port {{6677}} but rather on 
> the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark Driver is based on {{akka-remote}}. In order to 
> take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and 
> {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile 
> with Akka 2.4.x yet.
> What we want is the back port of mentioned {{akka-remote}} settings to 
> {{2.3.x}} versions. These patches are attached to this ticket - 
> {{2.3.4.patch}} and {{2.3.11.patch}} files provide patches for respective 
> akka versions. These add mentioned settings and ensure they work as 
> documented for Akka 2.4. In other words, these are future compatible.
> A part of that patch also exists in the patch for Spark, in the 
> {{org.apache.spark.util.AkkaUtils}} class. This is where Spark is creating 
> the driver and compiling the Akka configuration. That part of the patch tells 
> Akka to use {{bind-hostname}} instead of {{hostname}}, if 
> {{spark.driver.advertisedHost}} is given and use {{bind-port}} instead of 
> {{port}}, if {{spark.driver.advertisedPort}} is given. In such cases, 
> {{hostname}} and {{port}} are set to the advertised values, respectively.
> *Worth mentioning:* if {{spark.driver.advertisedHost}} or 
> {{spark.driver.advertisedPort}} isn't given, patched Spark reverts to using 
> the settings as they would be in case of non-patched {{akka-remote}}; exactly 
> for that purpose: if there is no patched {{akka-remote}} in use. Even if it 
> is in use, {{akka-remote}} will correctly handle undefined {{bind-hostname}} 
> and {{bind-port}}, as specified by Akka 2.4.x.
> h5. Akka versions in Spark (attached patches only)
> - Akka 2.3.4
>  - Spark 1.4.0
>  - Spark 1.4.1
> - Akka 2.3.11
>  - spark 1.5.0
>  - spark 1.5.1
>  - spark-1.6.0-SNAPSHOT
> h4. Taking on the problem, step 2: Spark services
> The fortunate thing is that every other Spark service is running over HTTP, 
> using an {{org.apache.spark.HttpServer}} class. This is where the second part 
> of the Spark patch comes into play. All other changes in the patch files 
> provide alternative {{advertised...}} ports for each of the following 
> services:
> - {{spark.broadcast.port}} -> {{spark.broadcast.advertisedPort}}
> - {{spark.fileserver.port}} -> {{spark.fileserver.advertisedPort}}
> - {{spark.replClassServer.port}} -> {{spark.replClassServer.advertisedPort}}
> What we are telling Spark here, is the following: if there is an alternative 
> {{advertisedPort}} setting given to this server instance, use that setting 
> for advertising the port.
> h4. Patches
> These patches are cleared by the Technicolor IP&L Team to be contributed back 
> under the Apache 2.0 License to Spark.
> All patches for versions from {{1.4.0}} to {{1.5.2}} can be applied directly 
> to the respective tag from Spark git repository. The {{1.6.0-master.patch}} 
> applies to git sha {{18350a57004eb87cafa9504ff73affab4b818e06}}.
> h4. Building Akka
> To build the required akka version:
> {noformat}
> AKKA_VERSION=2.3.4
> git clone https://github.com/akka/akka.git .
> git fetch origin
> git checkout v${AKKA_VERSION}
> git apply ...2.3.4.patch
> sbt package -Dakka.scaladoc.diagrams=false
> {noformat}
> h4. What is not supplied
> At the moment of contribution, we do not supply any unit tests. We would like 
> to contribute those but we may require some assistance.
> =====
> Happy to answer any questions and looking forward to any guidance which would 
> lead to have these included in the master Spark version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to