[jira] [Closed] (SPARK-24380) argument quoting/escaping broken in mesos cluster scheduler
[ https://issues.apache.org/jira/browse/SPARK-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles closed SPARK-24380. > argument quoting/escaping broken in mesos cluster scheduler > --- > > Key: SPARK-24380 > URL: https://issues.apache.org/jira/browse/SPARK-24380 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 2.2.0, 2.3.0 >Reporter: paul mackles >Priority: Critical > Fix For: 2.4.0 > > > When a configuration property contains shell characters that require quoting, > the Mesos cluster scheduler generates the spark-submit argument like so: > {code:java} > --conf "spark.mesos.executor.docker.parameters="label=logging=|foo|""{code} > Note the quotes around the property value as well as the key=value pair. When > using docker, this breaks the spark-submit command and causes the "|" to be > interpreted as an actual shell PIPE. Spaces, semi-colons, etc also cause > issues. > Although I haven't tried, I suspect this is also a potential security issue > in that someone could exploit it to run arbitrary code on the host. > My patch is pretty minimal and just removes the outer quotes around the > key=value pair, resulting in something like: > {code:java} > --conf spark.mesos.executor.docker.parameters="label=logging=|foo|"{code} > A more extensive fix might try wrapping the entire key=value pair in single > quotes but I was concerned about backwards compatibility with that change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24380) argument quoting/escaping broken in mesos cluster scheduler
[ https://issues.apache.org/jira/browse/SPARK-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles resolved SPARK-24380. -- Resolution: Duplicate Dupe of SPARK-23941, just a different config > argument quoting/escaping broken in mesos cluster scheduler > --- > > Key: SPARK-24380 > URL: https://issues.apache.org/jira/browse/SPARK-24380 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 2.2.0, 2.3.0 >Reporter: paul mackles >Priority: Critical > Fix For: 2.4.0 > > > When a configuration property contains shell characters that require quoting, > the Mesos cluster scheduler generates the spark-submit argument like so: > {code:java} > --conf "spark.mesos.executor.docker.parameters="label=logging=|foo|""{code} > Note the quotes around the property value as well as the key=value pair. When > using docker, this breaks the spark-submit command and causes the "|" to be > interpreted as an actual shell PIPE. Spaces, semi-colons, etc also cause > issues. > Although I haven't tried, I suspect this is also a potential security issue > in that someone could exploit it to run arbitrary code on the host. > My patch is pretty minimal and just removes the outer quotes around the > key=value pair, resulting in something like: > {code:java} > --conf spark.mesos.executor.docker.parameters="label=logging=|foo|"{code} > A more extensive fix might try wrapping the entire key=value pair in single > quotes but I was concerned about backwards compatibility with that change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24380) argument quoting/escaping broken
paul mackles created SPARK-24380: Summary: argument quoting/escaping broken Key: SPARK-24380 URL: https://issues.apache.org/jira/browse/SPARK-24380 Project: Spark Issue Type: Bug Components: Deploy, Mesos Affects Versions: 2.3.0, 2.2.0 Reporter: paul mackles Fix For: 2.4.0 When a configuration property contains shell characters that require quoting, the Mesos cluster scheduler generates the spark-submit argument like so: {code:java} --conf "spark.mesos.executor.docker.parameters="label=logging=|foo|""{code} Note the quotes around the property value as well as the key=value pair. When using docker, this breaks the spark-submit command and causes the "|" to be interpreted as an actual shell PIPE. Spaces, semi-colons, etc also cause issues. Although I haven't tried, I suspect this is also a potential security issue in that someone could exploit it to run arbitrary code on the host. My patch is pretty minimal and just removes the outer quotes around the key=value pair, resulting in something like: {code:java} --conf spark.mesos.executor.docker.parameters="label=logging=|foo|"{code} A more extensive fix might try wrapping the entire key=value pair in single quotes but I was concerned about backwards compatibility with that change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24380) argument quoting/escaping broken in mesos cluster scheduler
[ https://issues.apache.org/jira/browse/SPARK-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-24380: - Summary: argument quoting/escaping broken in mesos cluster scheduler (was: argument quoting/escaping broken) > argument quoting/escaping broken in mesos cluster scheduler > --- > > Key: SPARK-24380 > URL: https://issues.apache.org/jira/browse/SPARK-24380 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 2.2.0, 2.3.0 >Reporter: paul mackles >Priority: Critical > Fix For: 2.4.0 > > > When a configuration property contains shell characters that require quoting, > the Mesos cluster scheduler generates the spark-submit argument like so: > {code:java} > --conf "spark.mesos.executor.docker.parameters="label=logging=|foo|""{code} > Note the quotes around the property value as well as the key=value pair. When > using docker, this breaks the spark-submit command and causes the "|" to be > interpreted as an actual shell PIPE. Spaces, semi-colons, etc also cause > issues. > Although I haven't tried, I suspect this is also a potential security issue > in that someone could exploit it to run arbitrary code on the host. > My patch is pretty minimal and just removes the outer quotes around the > key=value pair, resulting in something like: > {code:java} > --conf spark.mesos.executor.docker.parameters="label=logging=|foo|"{code} > A more extensive fix might try wrapping the entire key=value pair in single > quotes but I was concerned about backwards compatibility with that change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23988) [Mesos] Improve handling of appResource in mesos dispatcher when using Docker
paul mackles created SPARK-23988: Summary: [Mesos] Improve handling of appResource in mesos dispatcher when using Docker Key: SPARK-23988 URL: https://issues.apache.org/jira/browse/SPARK-23988 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 2.3.0, 2.2.1 Reporter: paul mackles Our organization makes heavy use of Docker containers when running Spark on Mesos. The images we use for our containers include Spark along with all of the application dependencies. We find this to be a great way to manage our artifacts. When specifying the primary application jar (i.e. appResource), the mesos dispatcher insists on adding it to the list of URIs for Mesos to fetch as part of launching the driver's container. This leads to confusing behavior where paths such as: * file:///application.jar * local:/application.jar * /application.jar wind up being fetched from the host where the driver is running. Obviously, this doesn't work since all of the above examples are referencing the path of the jar on the container image itself. Here is an example that I used for testing: {code:java} spark-submit \ --class org.apache.spark.examples.SparkPi \ --master mesos://spark-dispatcher \ --deploy-mode cluster \ --conf spark.cores.max=4 \ --conf spark.mesos.executor.docker.image=spark:2.2.1 \ local:/usr/local/spark/examples/jars/spark-examples_2.11-2.2.1.jar 10{code} The "spark:2.2.1" image contains an installation of spark under "/usr/local/spark". Notice how we reference the appResource using the "local:/" scheme. If you try the above with the current version of the mesos dispatcher, it will try to fetch the path "/usr/local/spark/examples/jars/spark-examples_2.11-2.2.1.jar" from the host filesystem where the driver's container is running. On our systems, this fails since we don't have spark installed on the hosts. For the PR, all I did was modify the mesos dispatcher to not add the "appResource to the list of URIs for Mesos to fetch if it uses the "local:/" scheme. For now, I didn't change the behavior of absolute paths or the "file:/" scheme because I wanted to leave some form for the old behavior in place for backwards compatibility. Anyone have any opinions on whether these schemes should change as well? The PR also includes support for using "spark-internal" with Mesos in cluster mode which is something we need for another use-case. I can separate them if that makes more sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Improve observability of MesosRestServer/MesosClusterDispatcher
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Description: Two changes in this PR: * A /health endpoint for a quick binary indication on the health of MesosClusterDispatcher. Useful for those running MesosClusterDispatcher as a marathon app: [http://mesosphere.github.io/marathon/docs/health-checks.html]. Returns a 503 status if the server is unhealthy and a 200 if the server is healthy * A /status endpoint for a more detailed examination on the current state of a MesosClusterDispatcher instance. Useful as a troubleshooting/monitoring tool For both endpoints, regardless of status code, the following body is returned: {code:java} { "action" : "ServerStatusResponse", "launchedDrivers" : 0, "message" : "iamok", "queuedDrivers" : 0, "schedulerDriverStopped" : false, "serverSparkVersion" : "2.3.1-SNAPSHOT", "success" : true, "pendingRetryDrivers" : 0 }{code} Aside from surfacing all of the scheduler metrics, the response also includes the status of the Mesos SchedulerDriver. On numerous occasions now, we have observed scenarios where the Mesos SchedulerDriver quietly exits due to some other failure. When this happens, jobs queue up and the only way to clean things up is to restart the service. With the above health check, marathon can be configured to automatically restart the MesosClusterDispatcher service when the health check fails, lessening the need for manual intervention. was: Two changes: First, a more robust [health-check|[http://mesosphere.github.io/marathon/docs/health-checks.html]] for anyone who runs MesosClusterDispatcher as a marathon app. Specifically, this check verifies that the MesosSchedulerDriver is still running as we have seen certain cases where it stops (rather quietly) and the only way to revive it is a restart. With this health check, marathon will restart the dispatcher if the MesosSchedulerDriver stops running. The health check lives at the url "/health" and returns a 204 when the server is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped running). Second, a server status endpoint that replies with some basic metrics about the server. The status endpoint resides at the url "/status" and responds with: {code:java} { "action" : "ServerStatusResponse", "launchedDrivers" : 0, "message" : "server OK", "queuedDrivers" : 0, "schedulerDriverStopped" : false, "serverSparkVersion" : "2.3.1-SNAPSHOT", "success" : true }{code} As you can see, it includes a snapshot of the metrics/health of the scheduler. Useful for quick debugging/troubleshooting/monitoring. > Improve observability of MesosRestServer/MesosClusterDispatcher > --- > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > > Two changes in this PR: > * A /health endpoint for a quick binary indication on the health of > MesosClusterDispatcher. Useful for those running MesosClusterDispatcher as a > marathon app: [http://mesosphere.github.io/marathon/docs/health-checks.html]. > Returns a 503 status if the server is unhealthy and a 200 if the server is > healthy > * A /status endpoint for a more detailed examination on the current state of > a MesosClusterDispatcher instance. Useful as a troubleshooting/monitoring tool > For both endpoints, regardless of status code, the following body is returned: > > {code:java} > { > "action" : "ServerStatusResponse", > "launchedDrivers" : 0, > "message" : "iamok", > "queuedDrivers" : 0, > "schedulerDriverStopped" : false, > "serverSparkVersion" : "2.3.1-SNAPSHOT", > "success" : true, > "pendingRetryDrivers" : 0 > }{code} > Aside from surfacing all of the scheduler metrics, the response also includes > the status of the Mesos SchedulerDriver. On numerous occasions now, we have > observed scenarios where the Mesos SchedulerDriver quietly exits due to some > other failure. When this happens, jobs queue up and the only way to clean > things up is to restart the service. > With the above health check, marathon can be configured to automatically > restart the MesosClusterDispatcher service when the health check fails, > lessening the need for manual intervention. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Improve observability of MesosRestServer/MesosClusterDispatcher
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Description: Two changes: First, a more robust [health-check|[http://mesosphere.github.io/marathon/docs/health-checks.html]] for anyone who runs MesosClusterDispatcher as a marathon app. Specifically, this check verifies that the MesosSchedulerDriver is still running as we have seen certain cases where it stops (rather quietly) and the only way to revive it is a restart. With this health check, marathon will restart the dispatcher if the MesosSchedulerDriver stops running. The health check lives at the url "/health" and returns a 204 when the server is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped running). Second, a server status endpoint that replies with some basic metrics about the server. The status endpoint resides at the url "/status" and responds with: {code:java} { "action" : "ServerStatusResponse", "launchedDrivers" : 0, "message" : "server OK", "queuedDrivers" : 0, "schedulerDriverStopped" : false, "serverSparkVersion" : "2.3.1-SNAPSHOT", "success" : true }{code} As you can see, it includes a snapshot of the metrics/health of the scheduler. Useful for quick debugging/troubleshooting/monitoring. was: Add a more robust health-check to MesosRestServer so that anyone who runs MesosClusterDispatcher as a marathon app can use it to check the health of the server: [http://mesosphere.github.io/marathon/docs/health-checks.html] Specifically, this check verifies that the MesosSchedulerDriver is still running as we have seen certain cases where it dies (rather quietly) and the only way to revive it is a restart. With this health check, marathon will restart the dispatcher if the MesosSchedulerDriver stops running. The health check lives at the url "/health" and returns a 204 when the server is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped running). > Improve observability of MesosRestServer/MesosClusterDispatcher > --- > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > > Two changes: > First, a more robust > [health-check|[http://mesosphere.github.io/marathon/docs/health-checks.html]] > for anyone who runs MesosClusterDispatcher as a marathon app. Specifically, > this check verifies that the MesosSchedulerDriver is still running as we have > seen certain cases where it stops (rather quietly) and the only way to revive > it is a restart. With this health check, marathon will restart the dispatcher > if the MesosSchedulerDriver stops running. The health check lives at the url > "/health" and returns a 204 when the server is healthy and a 503 when it is > not (e.g. the MesosSchedulerDriver stopped running). > Second, a server status endpoint that replies with some basic metrics about > the server. The status endpoint resides at the url "/status" and responds > with: > {code:java} > { > "action" : "ServerStatusResponse", > "launchedDrivers" : 0, > "message" : "server OK", > "queuedDrivers" : 0, > "schedulerDriverStopped" : false, > "serverSparkVersion" : "2.3.1-SNAPSHOT", > "success" : true > }{code} > As you can see, it includes a snapshot of the metrics/health of the > scheduler. Useful for quick debugging/troubleshooting/monitoring. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Improve observability of MesosRestServer
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Summary: Improve observability of MesosRestServer (was: Add more specific health check to MesosRestServer) > Improve observability of MesosRestServer > > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > > Add a more robust health-check to MesosRestServer so that anyone who runs > MesosClusterDispatcher as a marathon app can use it to check the health of > the server: > [http://mesosphere.github.io/marathon/docs/health-checks.html] > Specifically, this check verifies that the MesosSchedulerDriver is still > running as we have seen certain cases where it dies (rather quietly) and the > only way to revive it is a restart. With this health check, marathon will > restart the dispatcher if the MesosSchedulerDriver stops running. > The health check lives at the url "/health" and returns a 204 when the server > is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped > running). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Improve observability of MesosRestServer/MesosClusterDispatcher
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Summary: Improve observability of MesosRestServer/MesosClusterDispatcher (was: Improve observability of MesosRestServer) > Improve observability of MesosRestServer/MesosClusterDispatcher > --- > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > > Add a more robust health-check to MesosRestServer so that anyone who runs > MesosClusterDispatcher as a marathon app can use it to check the health of > the server: > [http://mesosphere.github.io/marathon/docs/health-checks.html] > Specifically, this check verifies that the MesosSchedulerDriver is still > running as we have seen certain cases where it dies (rather quietly) and the > only way to revive it is a restart. With this health check, marathon will > restart the dispatcher if the MesosSchedulerDriver stops running. > The health check lives at the url "/health" and returns a 204 when the server > is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped > running). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Add more specific health check to MesosRestServer
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Description: Add a more robust health-check to MesosRestServer so that anyone who runs MesosClusterDispatcher as a marathon app can use it to check the health of the server: [http://mesosphere.github.io/marathon/docs/health-checks.html] Specifically, this check verifies that the MesosSchedulerDriver is still running as we have seen certain cases where it dies (rather quietly) and the only way to revive it is a restart. With this health check, marathon will restart the dispatcher if the MesosSchedulerDriver stops running. The health check lives at the url "/health" and returns a 204 when the server is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped running). > Add more specific health check to MesosRestServer > - > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > > Add a more robust health-check to MesosRestServer so that anyone who runs > MesosClusterDispatcher as a marathon app can use it to check the health of > the server: > [http://mesosphere.github.io/marathon/docs/health-checks.html] > Specifically, this check verifies that the MesosSchedulerDriver is still > running as we have seen certain cases where it dies (rather quietly) and the > only way to revive it is a restart. With this health check, marathon will > restart the dispatcher if the MesosSchedulerDriver stops running. > The health check lives at the url "/health" and returns a 204 when the server > is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped > running). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Add more specific health check to MesosRestServer
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Environment: was: Added a more robust health-check to MesosRestServer so that anyone who runs MesosClusterDispatcher as a marathon app can use it to check the health of the server: http://mesosphere.github.io/marathon/docs/health-checks.html Specifically, this check verifies that the MesosSchedulerDriver is still running as we have seen certain cases where it dies (rather quietly) and the only way to revive it is a restart. With this health check, marathon will restart the dispatcher if the MesosSchedulerDriver stops running. The health check lives at the url "/health" and returns a 204 when the server is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped running). > Add more specific health check to MesosRestServer > - > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23943) Add more specific health check to MesosRestServer
[ https://issues.apache.org/jira/browse/SPARK-23943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-23943: - Summary: Add more specific health check to MesosRestServer (was: add more specific health check to MesosRestServer) > Add more specific health check to MesosRestServer > - > > Key: SPARK-23943 > URL: https://issues.apache.org/jira/browse/SPARK-23943 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 2.2.1, 2.3.0 > Environment: Added a more robust health-check to MesosRestServer so > that anyone who runs MesosClusterDispatcher as a marathon app can use it to > check the health of the server: > http://mesosphere.github.io/marathon/docs/health-checks.html > Specifically, this check verifies that the MesosSchedulerDriver is still > running as we have seen certain cases where it dies (rather quietly) and the > only way to revive it is a restart. With this health check, marathon will > restart the dispatcher if the MesosSchedulerDriver stops running. > The health check lives at the url "/health" and returns a 204 when the server > is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped > running). > >Reporter: paul mackles >Priority: Minor > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23943) add more specific health check to MesosRestServer
paul mackles created SPARK-23943: Summary: add more specific health check to MesosRestServer Key: SPARK-23943 URL: https://issues.apache.org/jira/browse/SPARK-23943 Project: Spark Issue Type: Improvement Components: Deploy, Mesos Affects Versions: 2.3.0, 2.2.1 Environment: Added a more robust health-check to MesosRestServer so that anyone who runs MesosClusterDispatcher as a marathon app can use it to check the health of the server: http://mesosphere.github.io/marathon/docs/health-checks.html Specifically, this check verifies that the MesosSchedulerDriver is still running as we have seen certain cases where it dies (rather quietly) and the only way to revive it is a restart. With this health check, marathon will restart the dispatcher if the MesosSchedulerDriver stops running. The health check lives at the url "/health" and returns a 204 when the server is healthy and a 503 when it is not (e.g. the MesosSchedulerDriver stopped running). Reporter: paul mackles Fix For: 2.4.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22256) Introduce spark.mesos.driver.memoryOverhead
[ https://issues.apache.org/jira/browse/SPARK-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430262#comment-16430262 ] paul mackles commented on SPARK-22256: -- created a PR on behalf of [~clehene] since he as moved on to other projects: https://github.com/apache/spark/pull/21006 > Introduce spark.mesos.driver.memoryOverhead > > > Key: SPARK-22256 > URL: https://issues.apache.org/jira/browse/SPARK-22256 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.2.0, 2.3.0 >Reporter: Cosmin Lehene >Priority: Minor > Labels: docker, memory, mesos > Fix For: 2.3.1, 2.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When running spark driver in a container such as when using the Mesos > dispatcher service, we need to apply the same rules as for executors in order > to avoid the JVM going over the allotted limit and then killed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22256) Introduce spark.mesos.driver.memoryOverhead
[ https://issues.apache.org/jira/browse/SPARK-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-22256: - Fix Version/s: 2.4.0 2.3.1 > Introduce spark.mesos.driver.memoryOverhead > > > Key: SPARK-22256 > URL: https://issues.apache.org/jira/browse/SPARK-22256 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.2.0, 2.3.0 >Reporter: Cosmin Lehene >Priority: Minor > Labels: docker, memory, mesos > Fix For: 2.3.1, 2.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When running spark driver in a container such as when using the Mesos > dispatcher service, we need to apply the same rules as for executors in order > to avoid the JVM going over the allotted limit and then killed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22256) Introduce spark.mesos.driver.memoryOverhead
[ https://issues.apache.org/jira/browse/SPARK-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-22256: - Affects Version/s: 2.3.0 > Introduce spark.mesos.driver.memoryOverhead > > > Key: SPARK-22256 > URL: https://issues.apache.org/jira/browse/SPARK-22256 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.2.0, 2.3.0 >Reporter: Cosmin Lehene >Priority: Minor > Labels: docker, memory, mesos > Original Estimate: 24h > Remaining Estimate: 24h > > When running spark driver in a container such as when using the Mesos > dispatcher service, we need to apply the same rules as for executors in order > to avoid the JVM going over the allotted limit and then killed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11499) Spark History Server UI should respect protocol when doing redirection
[ https://issues.apache.org/jira/browse/SPARK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332811#comment-16332811 ] paul mackles commented on SPARK-11499: -- We ran into this issue running the spark-history server as a Marathon app on a Mesos cluster. As is typical for this kind of setup, there is a reverse-proxy that users go through to access the app. In our case, we are also offloading SSL to the reverse-proxy so communications between the reverse-proxy and spark-history are plain-old HTTP. I experimented with 2 different fixes: # Making sure that the SparkUI and History components look at APPLICATION_WEB_PROXY_BASE when generating redirect URLs. In order for it to honor the protocol, APPLICATION_WEB_PROXY_BASE must include the desired protocol (i.e. APPLICATION_WEB_PROXY_BASE=https://example.com) # Using Jetty's built-in ForwardRequestCustomizer class to process "X-Forwarded-*" headers defined in rfc7239. Both changes worked in our environment and both changes are fairly simple. Looking for feedback on whether one solution is preferable to the other. For our environment, #2 is preferable because: * The reverse proxy we use is already sending these headers. * Allows for the spark-history server to see the actual client info as opposed to that of the proxy If no strong feelings one way or another, I'll submit a PR for solution #2. References: * [https://tools.ietf.org/html/rfc7239] * [http://download.eclipse.org/jetty/stable-9/apidocs/org/eclipse/jetty/server/ForwardedRequestCustomizer.html] > Spark History Server UI should respect protocol when doing redirection > -- > > Key: SPARK-11499 > URL: https://issues.apache.org/jira/browse/SPARK-11499 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Lukasz Jastrzebski >Priority: Major > > Use case: > Spark history server is behind load balancer secured with ssl certificate, > unfortunately clicking on the application link redirects it to http protocol, > which may be not expose by load balancer, example flow: > * Trying 52.22.220.1... > * Connected to xxx.yyy.com (52.22.220.1) port 8775 (#0) > * WARNING: SSL: Certificate type not set, assuming PKCS#12 format. > * Client certificate: u...@yyy.com > * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 > * Server certificate: *.yyy.com > * Server certificate: Entrust Certification Authority - L1K > * Server certificate: Entrust Root Certification Authority - G2 > > GET /history/20151030-160604-3039174572-5951-22401-0004 HTTP/1.1 > > Host: xxx.yyy.com:8775 > > User-Agent: curl/7.43.0 > > Accept: */* > > > < HTTP/1.1 302 Found > < Location: > http://xxx.yyy.com:8775/history/20151030-160604-3039174572-5951-22401-0004 > < Connection: close > < Server: Jetty(8.y.z-SNAPSHOT) > < > * Closing connection 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23088) History server not showing incomplete/running applications
paul mackles created SPARK-23088: Summary: History server not showing incomplete/running applications Key: SPARK-23088 URL: https://issues.apache.org/jira/browse/SPARK-23088 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 2.2.1, 2.1.2 Reporter: paul mackles History server not showing incomplete/running applications when _spark.history.ui.maxApplications_ property is set to a value that is smaller than the total number of applications. I believe this is because any applications where completed=false wind up at the end of the list of apps returned by the /applications endpoint and when _spark.history.ui.maxApplications_ is set, that list gets truncated and the running apps are never returned. The fix I have in mind is to modify the history template to start passing the _status_ parameter when calling the /applications endpoint (status=completed is the default). I am running Spark in a Mesos environment but I don't think that is relevant to this issue -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22528) History service and non-HDFS filesystems
[ https://issues.apache.org/jira/browse/SPARK-22528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260727#comment-16260727 ] paul mackles commented on SPARK-22528: -- In case anyone else bumps into this, I received some feedback from the data-lake team at MSFT: This is expected behavior since Hadoop supports Kerberos based identity whereas data lake supports OAuth2 – Azure active directory (AAD). The bridge/mapping between Kerberos and AAD OAuth2 is supported only in Azure HDInsight cluster today. OAuth2 support in Hadoop is non-trivial task and is in progress - https://issues.apache.org/jira/browse/HADOOP-11744 Workaround for the limitation is (Specific to data lake) core-site.xml {code} adl.debug.override.localuserasfileowner true {code} What does this configuration do ? FileStatus contains the user/group information which is associated with object id from AAD. Hadoop driver would replace object id with local Hadoop user under the context of Hadoop process. Actual file information in data lake remains unchanged though, only shadowed behind the local Hadoop user. > History service and non-HDFS filesystems > > > Key: SPARK-22528 > URL: https://issues.apache.org/jira/browse/SPARK-22528 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: paul mackles >Priority: Minor > > We are using Azure Data Lake (ADL) to store our event logs. This worked fine > in 2.1.x but in 2.2.0, the event logs are no longer visible to the history > server. I tracked it down to the call to: > {code} > SparkHadoopUtil.get.checkAccessPermission() > {code} > which was added to "FSHistoryProvider" in 2.2.0. > I was able to workaround it by: > * setting the files on ADL to world readable > * setting HADOOP_PROXY to the Azure objectId of the service principal that > owns file > Neither of those workaround are particularly desirable in our environment. > That said, I am not sure how this should be addressed: > * Is this an issue with the Azure/Hadoop bindings not setting up the user > context correctly so that the "checkAccessPermission()" call succeeds w/out > having to use the username under which the process is running? > * Is this an issue with "checkAccessPermission()" not really accounting for > all of the possible FileSystem implementations? If so, I would imagine that > there are similar issues when using S3. > In spite of this check, I know the files are accessible through the > underlying FileSystem object so it feels like the latter but I don't think > that the FileSystem object alone could be used to implement this check. > Any thoughts [~jerryshao]? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22528) History service and non-HDFS filesystems
[ https://issues.apache.org/jira/browse/SPARK-22528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-22528: - Description: We are using Azure Data Lake (ADL) to store our event logs. This worked fine in 2.1.x but in 2.2.0, the event logs are no longer visible to the history server. I tracked it down to the call to: {code} SparkHadoopUtil.get.checkAccessPermission() {code} which was added to "FSHistoryProvider" in 2.2.0. I was able to workaround it by: * setting the files on ADL to world readable * setting HADOOP_PROXY to the Azure objectId of the service principal that owns file Neither of those workaround are particularly desirable in our environment. That said, I am not sure how this should be addressed: * Is this an issue with the Azure/Hadoop bindings not setting up the user context correctly so that the "checkAccessPermission()" call succeeds w/out having to use the username under which the process is running? * Is this an issue with "checkAccessPermission()" not really accounting for all of the possible FileSystem implementations? If so, I would imagine that there are similar issues when using S3. In spite of this check, I know the files are accessible through the underlying FileSystem object so it feels like the latter but I don't think that the FileSystem object alone could be used to implement this check. Any thoughts [~jerryshao]? was: We are using Azure Data Lake (ADL) to store our event logs. This worked fine in 2.1.x but in 2.2.0, the event logs are no longer visible to the history server. I tracked it down to the call to: {code} SparkHadoopUtil.get.checkAccessPermission() {code} which was added to "FSHistoryProvider" in 2.2.0. I was able to workaround it by: * setting the files to world readable * setting HADOOP_PROXY to the Azure objectId of the service principal that owns file Neither of those workaround are particularly desirable in our environment. That said, I am not sure how this should be addressed: * Is this an issue with the Azure/Hadoop bindings not setting up the user context correctly so that the "checkAccessPermission()" call succeeds w/out having to use the username under which the process is running? * Is this an issue with "checkAccessPermission()" not really accounting for all of the possible FileSystem implementations? If so, I would imagine that there are similar issues with using S3. In spite of this check, I know the files are accessible through the underlying FileSystem object so it feels like the latter but I don't that the FileSystem object alone could be used to implement this check. > History service and non-HDFS filesystems > > > Key: SPARK-22528 > URL: https://issues.apache.org/jira/browse/SPARK-22528 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: paul mackles >Priority: Minor > > We are using Azure Data Lake (ADL) to store our event logs. This worked fine > in 2.1.x but in 2.2.0, the event logs are no longer visible to the history > server. I tracked it down to the call to: > {code} > SparkHadoopUtil.get.checkAccessPermission() > {code} > which was added to "FSHistoryProvider" in 2.2.0. > I was able to workaround it by: > * setting the files on ADL to world readable > * setting HADOOP_PROXY to the Azure objectId of the service principal that > owns file > Neither of those workaround are particularly desirable in our environment. > That said, I am not sure how this should be addressed: > * Is this an issue with the Azure/Hadoop bindings not setting up the user > context correctly so that the "checkAccessPermission()" call succeeds w/out > having to use the username under which the process is running? > * Is this an issue with "checkAccessPermission()" not really accounting for > all of the possible FileSystem implementations? If so, I would imagine that > there are similar issues when using S3. > In spite of this check, I know the files are accessible through the > underlying FileSystem object so it feels like the latter but I don't think > that the FileSystem object alone could be used to implement this check. > Any thoughts [~jerryshao]? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22528) History service and non-HDFS filesystems
paul mackles created SPARK-22528: Summary: History service and non-HDFS filesystems Key: SPARK-22528 URL: https://issues.apache.org/jira/browse/SPARK-22528 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: paul mackles Priority: Minor We are using Azure Data Lake (ADL) to store our event logs. This worked fine in 2.1.x but in 2.2.0, the event logs are no longer visible to the history server. I tracked it down to the call to: {code} SparkHadoopUtil.get.checkAccessPermission() {code} which was added to "FSHistoryProvider" in 2.2.0. I was able to workaround it by: * setting the files to world readable * setting HADOOP_PROXY to the Azure objectId of the service principal that owns file Neither of those workaround are particularly desirable in our environment. That said, I am not sure how this should be addressed: * Is this an issue with the Azure/Hadoop bindings not setting up the user context correctly so that the "checkAccessPermission()" call succeeds w/out having to use the username under which the process is running? * Is this an issue with "checkAccessPermission()" not really accounting for all of the possible FileSystem implementations? If so, I would imagine that there are similar issues with using S3. In spite of this check, I know the files are accessible through the underlying FileSystem object so it feels like the latter but I don't that the FileSystem object alone could be used to implement this check. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22287) [MESOS] SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher
paul mackles created SPARK-22287: Summary: [MESOS] SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher Key: SPARK-22287 URL: https://issues.apache.org/jira/browse/SPARK-22287 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 2.2.0, 2.1.1, 2.3.0 Reporter: paul mackles Priority: Minor There does not appear to be a way to control the heap size used by MesosClusterDispatcher as the SPARK_DAEMON_MEMORY environment variable is not honored for that particular daemon. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22287) SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher
[ https://issues.apache.org/jira/browse/SPARK-22287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-22287: - Summary: SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher (was: [MESOS] SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher) > SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher > - > > Key: SPARK-22287 > URL: https://issues.apache.org/jira/browse/SPARK-22287 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.1.1, 2.2.0, 2.3.0 >Reporter: paul mackles >Priority: Minor > > There does not appear to be a way to control the heap size used by > MesosClusterDispatcher as the SPARK_DAEMON_MEMORY environment variable is not > honored for that particular daemon. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12832) Mesos cluster mode should handle constraints
[ https://issues.apache.org/jira/browse/SPARK-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187059#comment-16187059 ] paul mackles commented on SPARK-12832: -- I am thinking this one should be closed. It appears to be a a duplicate of SPARK-19606 > Mesos cluster mode should handle constraints > > > Key: SPARK-12832 > URL: https://issues.apache.org/jira/browse/SPARK-12832 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: astralidea > > on mesos some machine use for spark others use for e.g.ELK cluster.but if > driver does not have spark runtime.when deploy a spark job.driver will fail > because it run on no spark runtime machine.CoarseMesosSchedulerBackend have > constraints feature but dispacher deploy use MesosClusterScheduler, it is > different method. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19606) Support constraints in spark-dispatcher
[ https://issues.apache.org/jira/browse/SPARK-19606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187058#comment-16187058 ] paul mackles commented on SPARK-19606: -- +1 to being able to constrain drivers and another +1 to [~pgillet]'s suggestion for allowing drivers to be constrained to different resources than the executors. However, given my understanding, I don't think that using "spark.mesos.dispatcher.driverDefault.spark.mesos.constraints" will work. If "spark.mesos.constraints" is passed with the job then it will wind up overriding the value specified in the "driverDefault" property. If "spark.mesos.constraints" is not passed with the job, then the value specified in the "driverDefault" property will get passed to the executors - which we definitely don't want. To maintain backwards compatibility while allowing drivers/executors to be constrained to either the same or different resources, I propose an additional property: spark.mesos.constraints.driver The new property could be set per job or for all jobs using "spark.mesos.dispatcher.driverDefault.*". The existing property "spark.mesos.constraints" would continue to apply to executors only. If we can come to a consensus on this, I am happy to work on the PR > Support constraints in spark-dispatcher > --- > > Key: SPARK-19606 > URL: https://issues.apache.org/jira/browse/SPARK-19606 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Philipp Hoffmann > > The `spark.mesos.constraints` configuration is ignored by the > spark-dispatcher. The constraints need to be passed in the Framework > information when registering with Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22135) metrics in spark-dispatcher not being registered properly
[ https://issues.apache.org/jira/browse/SPARK-22135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181804#comment-16181804 ] paul mackles commented on SPARK-22135: -- here is the PR: https://github.com/apache/spark/pull/19358 > metrics in spark-dispatcher not being registered properly > - > > Key: SPARK-22135 > URL: https://issues.apache.org/jira/browse/SPARK-22135 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 2.1.0, 2.2.0 >Reporter: paul mackles >Priority: Minor > > There is a bug in the way that the metrics in > org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are > initialized such that they are never registered with the underlying registry. > Basically, each call to the overridden "metricRegistry" function results in > the creation of a new registry. PR is forthcoming. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22135) metrics in spark-dispatcher not being registered properly
[ https://issues.apache.org/jira/browse/SPARK-22135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-22135: - Description: There is a bug in the way that the metrics in org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are initialized such that they are never registered with the underlying registry. Basically, each call to the overridden "metricRegistry" function results in the creation of a new registry. PR is forthcoming. (was: There is a bug in the way that the metrics in org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are initialized such that they are never registered with the underlying registry. Basically, each call to the overridden "metricRegistry" function results in the creation of a new registry. Patch is forthcoming.) > metrics in spark-dispatcher not being registered properly > - > > Key: SPARK-22135 > URL: https://issues.apache.org/jira/browse/SPARK-22135 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 2.1.0, 2.2.0 >Reporter: paul mackles >Priority: Minor > > There is a bug in the way that the metrics in > org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are > initialized such that they are never registered with the underlying registry. > Basically, each call to the overridden "metricRegistry" function results in > the creation of a new registry. PR is forthcoming. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22135) metrics in spark-dispatcher not being registered properly
paul mackles created SPARK-22135: Summary: metrics in spark-dispatcher not being registered properly Key: SPARK-22135 URL: https://issues.apache.org/jira/browse/SPARK-22135 Project: Spark Issue Type: Bug Components: Deploy, Mesos Affects Versions: 2.2.0, 2.1.0 Reporter: paul mackles Priority: Minor There is a bug in the way that the metrics in org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are initialized such that they are never registered with the underlying registry. Basically, each call to the overridden "metricRegistry" function results in the creation of a new registry. Patch is forthcoming. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org