[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432163#comment-15432163 ] Liam Fisk commented on SPARK-11638: --- Just mirroring what I said on SPARK-4563 about the lack of support for bridged networking: {quote} It also makes life difficult for OSX users. Docker for Mac uses xhyve to virtualize the docker engine (https://docs.docker.com/engine/installation/mac/), and thus `--net=host` binds to the VM's network instead of the true OSX host. The SPARK_LOCAL_IP ends up as 172.17.0.2, which is not externally contactable. The end result is OSX users cannot containerize Spark if Spark needs to contact a mesos cluster. {quote} While you are unlikely to have Spark running on an OSX machine in production, the development experience is a bit painful if you have to run a separate VM with public networking. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432151#comment-15432151 ] Liam Fisk commented on SPARK-11638: --- This ticket appears to be related to SPARK-4563. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are attached to this ticket - > {{2.3.4.patch}} and {{2.3.11.patch}} files provide patches for respective > akka versions. These add mentioned settings and ensure they
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412461#comment-15412461 ] Michael Gummelt commented on SPARK-11638: - [~radekg] > The only advantage we had was using the same configuration inside of the > docker container. You mean you want to run the spark driver in a docker container? Which configuration did you have to change? I can look more into this, but I need a clear "It's easier/better to do X in bridge mode than in host mode". > So with the HTTP API, Spark would still require the heavy libmesos in order > to work with Mesos? No. The HTTP API will remove the libmesos dependency, which is nice. It's not an urgent priority though. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412293#comment-15412293 ] Radoslaw Gruchalski commented on SPARK-11638: - [~mandoskippy] Yes, there my lack of knowledge regarding the API can be seen. Just read the http://events.linuxfoundation.org/sites/events/files/slides/Mesos_HTTP_API.pdf. Considering that the Mesos scheduler was changed to take advantage, might be the case. Older versions of Mesos would still require native library. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411884#comment-15411884 ] John Omernik commented on SPARK-11638: -- So with the HTTP API, Spark would still require the heavy libmesos in order to work with Mesos? Even if all things in Spark shifted to using the HTTP API? My understanding was that part of the reason for the HTTPAPI was to remove the need for libmesos? John On Mon, Aug 8, 2016 at 7:31 AM, Radoslaw Gruchalski (JIRA)> Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411731#comment-15411731 ] Radoslaw Gruchalski commented on SPARK-11638: - [~mandoskippy] No, HTTP API won't help anything. Spark still requires Mesos for the Mesos framework. Not quite sure what exactly you would like to achieve with it. As for bridged networking. I can see it being useful for different multi user scenarios but there would need to be some understanding on the maintainer's side. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411715#comment-15411715 ] John Omernik commented on SPARK-11638: -- I hadn't tried using the LIBPROCESS settings, I need to sit down and work with it again. The issue I was running into, was when I used docker containers for Spark, Bridged Networking caused failures because (likely the LIBPROCESS issue) and host networking caused issues for me, because I was trying to use a network client for MapR FS that didn't like running on the same "host" node (networking wise) as the client in the container. Now, options are good, bridge networking is also nice from a configuration of the container standpoint. As to the HTTP API for Mesos, this is something that could help reduce the dependancies and size of the container. Not that it matters TOO much, but shoveling around 1.5GB images (libmesos, spark, mapr client, etc) can get annoying :) I need to find some time to try building and using LIBPROCESS > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411366#comment-15411366 ] Radoslaw Gruchalski commented on SPARK-11638: - [~mandoskippy] I have not been following Mesos API between 0.28.2 and 1.0.0. No idea what changes would make it work without having to change LIBPROCESS env in Docker bridge networking. I wouldn't call using the documented LIBPROCESS functionality "tricking". Changing LIBPROCESS settings in Docker bridge is required because your Mesos in the container is bound to a different port than what Mesos cluster outside expects. A bit like what Akka was in the original 1.4 version of Spark. The difference was that Akka could not be changed like this (until 2.4). [~mgummelt] We used it at my previous place for starting Zeppelin, Jupyter and Spark notebook (the data fellas one) instances in a multi tenant Mesos environment with Docker. Now, why was it done in bridge networking? I implemented it so I guess I should know. But I don't. The answer is: nobody asked "why does it need bridge in the first place". And the answer to the question nobody asked is: "it doesn't need bridge mode, everything what we've done at the previous place, in bridge, can be done with host network". The only advantage we had was using the same configuration _inside_ of the docker container. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411277#comment-15411277 ] Michael Gummelt commented on SPARK-11638: - This JIRA is complex and a lot of it is out of date. Can someone briefly explain to me what the problem is? Why do you want bridge networking? > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are attached to this ticket - > {{2.3.4.patch}} and
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410261#comment-15410261 ] Stavros Kontopoulos commented on SPARK-11638: - [~radekg][~mgummelt] what do you think? > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are attached to this ticket - > {{2.3.4.patch}} and {{2.3.11.patch}} files provide patches for respective > akka versions. These add mentioned settings and
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408068#comment-15408068 ] John Omernik commented on SPARK-11638: -- Another new guy, late to the game: Would work done on having Spark support the new Mesos HTTP API help with this? Basically, if Spark was able to communicate without having the thick native lib, could we make things easier from a networking perspective. *I could be wrong on this part* As I read it, the "connect back to the scheduler" that needs a JIRA like this for docker and bridged mode, goes away when you use the HTTP API because it's all scheduler to Mesos Master communication. *I Think* Thus, while this is good work, working on the 1.0 Mesos APIs could actually solve this problem, and open up new possibilities... Thoughts? > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195304#comment-15195304 ] Radoslaw Gruchalski commented on SPARK-11638: - [~skonto], being just made redundant, I believe I have a lot of spare cycles as well. Yes, sounds like a good idea. Would it be worth discussing exactly what the outcome of this feature should be? > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195275#comment-15195275 ] Stavros Kontopoulos commented on SPARK-11638: - [~radekg] Ara you planning to work on this on master? I have some spare cycles ... > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are attached to this ticket - > {{2.3.4.patch}} and {{2.3.11.patch}} files provide patches for respective > akka