[GitHub] spark issue #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driver.memo...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/21006 cc @skonto @samvantran --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler options...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20641 @skonto Thanks for testing it. Tests results look good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20640 I'm in favor of merging this. The hardcoded limit is pretty bad - particularly for streaming jobs; it would be preferable to remove it ASAP even though it may not be a complete solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/21033 @yanji84 I tried to restore the previous behavior when `spark.mesos.executor.gpus` is not specified. Here's the commit in my fork: https://github.com/mesosphere/spark/pull/23/commits/5adb830bc630f4995470b08157d30016e3b4567e I restored the previous unit test as well. Seems to work okay. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21033: [SPARK-19320][MESOS]allow specifying a hard limit...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/21033#discussion_r182283035 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -495,9 +500,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( launchTasks = true val taskId = newMesosTaskId() val offerCPUs = getResource(resources, "cpus").toInt - val taskGPUs = Math.min( -Math.max(0, maxGpus - totalGpusAcquired), getResource(resources, "gpus").toInt) - + val offerGPUs = getResource(resources, "gpus").toInt + var taskGPUs = executorGpus --- End diff -- Ah, good point, I missed that earlier. @yanji84 Why are we changing the default behavior when `spark.mesos.executor.gpus` is not specified? Previously, if `spark.mesos.gpus.max` was set (without setting `spark.mesos.executor.gpus`), GPUs were allocated greedily. This aligns with the CPU behavior when `spark.executor.cores` is not specified. GPUs could be handled the same way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/20641#discussion_r182271196 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala --- @@ -199,6 +199,38 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi }) } + test("properly wraps and escapes parameters passed to driver command") { --- End diff -- Sorry for the delay. I was going to test this in DC/OS and haven't gotten a chance to do so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/21033 LGTM. @yanji84 You may want to remove the "WIP" in the PR title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/21033 @yanji84 Thanks for the patch. I tested your previous PR on GPUs running on DC/OS and everything worked fine. Would you mind updating the documentation as well - https://github.com/apache/spark/blob/master/docs/running-on-mesos.md? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20945 (1) seems the most secure. How do we handle keytabs today in cluster mode in pure Mesos? Is it the same situation -- the keytab gets sent over a HTTP connection to the Dispatcher? (3) Yes, the TGT secret would still be available from the secret store. There's currently no constraint based on a OS user. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20945 @skonto Basic question: in your example above, which user does the "krb5cc_65534" ticket cache belong to? The superuser or the proxy-user ("nobody")? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17979: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/17979 I'm interested in this PR, too. Who has permission to reopen this? cc @HyukjinKwon @yanji84 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20640 @skonto Yes, I'm ok with that. Sorry for the delayed response. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/20641#discussion_r17399 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala --- @@ -199,6 +199,38 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi }) } + test("properly wraps and escapes parameters passed to driver command") { +setScheduler() + +val mem = 1000 +val cpu = 1 + +val response = scheduler.submitDriver( + new MesosDriverDescription("d1", "jar", mem, cpu, true, +command, +Map("spark.mesos.executor.home" -> "test", + "spark.app.name" -> "test", + // no special characters, wrap only + "spark.driver.extraJavaOptions" -> +"-XX+PrintGC -Dparam1=val1 -Dparam2=val2", + // special characters, to be escaped + "spark.executor.extraJavaOptions" -> +"""-Dparam1="value 1" -Dparam2=value\ 2 -Dpath=$PATH"""), --- End diff -- I'm not sure the second param (param2) would be parsed correctly if you actually ran the command. Doesn't there need to be quotes around the space? Have you tested it and checked if the executor gets the correct value for param2? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler options...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20641 Thanks for the PR! It seems that the previous attempt to fix this (SPARK-18114) was wrong -- I'm not sure why we didn't catch the problem before, maybe lack of testing? @krcz My suggestion for this patch is to add a test, in order to prevent another regression in the future. I've written a unit test for this -- you could do something similar: https://github.com/mesosphere/spark/commit/4812ba3d10264f6d22ec654fa16b5810d70c27a9 I will also do more testing with my own integration tests. cc @skonto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20640 @skonto We should not remove the logging. The logging [here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194) is only available if blacklisting is enabled, but by default blacklisting is disabled. The BlacklistTracker object [is not created](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L748) if blacklisting is disabled. But, we might still want to see the log of executor failure in this case. In the case where an executor fails before entering Spark code (for example, Mesos agent failed to create the sandbox), would it be detected? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/20640#discussion_r169676540 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -648,15 +645,6 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( totalGpusAcquired -= gpus gpusByTaskId -= taskId } -// If it was a failure, mark the slave as failed for blacklisting purposes -if (TaskState.isFailed(state)) { - slave.taskFailures += 1 - - if (slave.taskFailures >= MAX_SLAVE_FAILURES) { -logInfo(s"Blacklisting Mesos slave $slaveId due to too many failures; " + --- End diff -- @IgorBerman Yes, in the default case, it would be nice to have this information about a task failing, especially if it fails repeatedly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20167: [SPARK-16501] [MESOS] Allow providing Mesos princ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/20167#discussion_r166086691 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -71,40 +74,64 @@ trait MesosSchedulerUtils extends Logging { failoverTimeout: Option[Double] = None, frameworkId: Option[String] = None): SchedulerDriver = { val fwInfoBuilder = FrameworkInfo.newBuilder().setUser(sparkUser).setName(appName) -val credBuilder = Credential.newBuilder() + fwInfoBuilder.setHostname(Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse( + conf.get(DRIVER_HOST_ADDRESS))) webuiUrl.foreach { url => fwInfoBuilder.setWebuiUrl(url) } checkpoint.foreach { checkpoint => fwInfoBuilder.setCheckpoint(checkpoint) } failoverTimeout.foreach { timeout => fwInfoBuilder.setFailoverTimeout(timeout) } frameworkId.foreach { id => fwInfoBuilder.setId(FrameworkID.newBuilder().setValue(id).build()) } - fwInfoBuilder.setHostname(Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse( - conf.get(DRIVER_HOST_ADDRESS))) -conf.getOption("spark.mesos.principal").foreach { principal => - fwInfoBuilder.setPrincipal(principal) - credBuilder.setPrincipal(principal) -} -conf.getOption("spark.mesos.secret").foreach { secret => - credBuilder.setSecret(secret) -} -if (credBuilder.hasSecret && !fwInfoBuilder.hasPrincipal) { - throw new SparkException( -"spark.mesos.principal must be configured when spark.mesos.secret is set") -} + conf.getOption("spark.mesos.role").foreach { role => fwInfoBuilder.setRole(role) } val maxGpus = conf.getInt("spark.mesos.gpus.max", 0) if (maxGpus > 0) { fwInfoBuilder.addCapabilities(Capability.newBuilder().setType(Capability.Type.GPU_RESOURCES)) } +val credBuilder = buildCredentials(conf, fwInfoBuilder) if (credBuilder.hasPrincipal) { new MesosSchedulerDriver( scheduler, fwInfoBuilder.build(), masterUrl, credBuilder.build()) } else { new MesosSchedulerDriver(scheduler, fwInfoBuilder.build(), masterUrl) } } + + def buildCredentials( + conf: SparkConf, + fwInfoBuilder: Protos.FrameworkInfo.Builder): Protos.Credential.Builder = { +val credBuilder = Credential.newBuilder() +conf.getOption("spark.mesos.principal") + .orElse(Option(conf.getenv("SPARK_MESOS_PRINCIPAL"))) --- End diff -- I think it's fine to set that env var, SPARK_MESOS_PRINCIPAL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r154991966 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/rest/mesos/MesosRestServer.scala --- @@ -77,10 +77,16 @@ private[mesos] class MesosSubmitRequestServlet( private def buildDriverDescription(request: CreateSubmissionRequest): MesosDriverDescription = { // Required fields, including the main class because python is not yet supported val appResource = Option(request.appResource).getOrElse { - throw new SubmitRestMissingFieldException("Application jar is missing.") + throw new SubmitRestMissingFieldException("Application jar 'appResource' is missing.") } val mainClass = Option(request.mainClass).getOrElse { - throw new SubmitRestMissingFieldException("Main class is missing.") + throw new SubmitRestMissingFieldException("Main class 'mainClass' is missing.") +} +val appArgs = Option(request.appArgs).getOrElse { + throw new SubmitRestMissingFieldException("Application arguments 'appArgs' are missing.") +} +val environmentVariables = Option(request.environmentVariables).getOrElse { + throw new SubmitRestMissingFieldException("Environment variables 'environmentVariables' are missing.") --- End diff -- Yes, the arguments were assumed to be there, but validation was missing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r154102830 --- Diff: core/src/test/scala/org/apache/spark/deploy/rest/SubmitRestProtocolSuite.scala --- @@ -86,6 +86,8 @@ class SubmitRestProtocolSuite extends SparkFunSuite { message.clientSparkVersion = "1.2.3" message.appResource = "honey-walnut-cherry.jar" message.mainClass = "org.apache.spark.examples.SparkPie" +message.appArgs = Array("hdfs://tmp/auth") +message.environmentVariables = Map("SPARK_HOME" -> "/test") --- End diff -- LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r153906029 --- Diff: core/src/test/scala/org/apache/spark/deploy/rest/SubmitRestProtocolSuite.scala --- @@ -86,6 +86,8 @@ class SubmitRestProtocolSuite extends SparkFunSuite { message.clientSparkVersion = "1.2.3" message.appResource = "honey-walnut-cherry.jar" message.mainClass = "org.apache.spark.examples.SparkPie" +message.appArgs = Array("hdfs://tmp/auth") +message.environmentVariables = Map("SPARK_HOME" -> "/test") --- End diff -- Yeah, that's fine with me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r153824692 --- Diff: core/src/test/scala/org/apache/spark/deploy/rest/SubmitRestProtocolSuite.scala --- @@ -86,6 +86,8 @@ class SubmitRestProtocolSuite extends SparkFunSuite { message.clientSparkVersion = "1.2.3" message.appResource = "honey-walnut-cherry.jar" message.mainClass = "org.apache.spark.examples.SparkPie" +message.appArgs = Array("hdfs://tmp/auth") +message.environmentVariables = Map("SPARK_HOME" -> "/test") --- End diff -- OK, I was looking at RestSubmissionClient (used by Mesos and Standalone) - the client does always set 'appArgs' and 'envrionmentVariables': https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala#L427 So, it's only a bug when the user does not use the RestSubmissionClient (by using 'curl' directly to the server, for example). So, that addresses my concern about Standalone mode. As to whether to add a similar check in the Standalone class, I don't have a strong opinion about it (fix it here or fix in another PR). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r153530156 --- Diff: core/src/test/scala/org/apache/spark/deploy/rest/SubmitRestProtocolSuite.scala --- @@ -86,6 +86,8 @@ class SubmitRestProtocolSuite extends SparkFunSuite { message.clientSparkVersion = "1.2.3" message.appResource = "honey-walnut-cherry.jar" message.mainClass = "org.apache.spark.examples.SparkPie" +message.appArgs = Array("hdfs://tmp/auth") +message.environmentVariables = Map("SPARK_HOME" -> "/test") --- End diff -- @Gschiavon You're right, I didn't see those variables were already set below. But, I do have a question now about whether these fields are optional or not. In the original test, (L.93), there's a comment about "optional fields", and the app args and env vars are tested there (L.105-106). Since this test was added with Standalone mode in mind, I wonder if these fields are considered optional there? There's no documentation as you said so we can't know for sure. I just want to make sure the extra validation doesn't break any existing applications (in Standalone cluster mode). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r153352098 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/SubmitRestProtocolRequest.scala --- @@ -46,6 +46,8 @@ private[rest] class CreateSubmissionRequest extends SubmitRestProtocolRequest { super.doValidate() assert(sparkProperties != null, "No Spark properties set!") assertFieldIsSet(appResource, "appResource") +assertFieldIsSet(appArgs, "appArgs") +assertFieldIsSet(environmentVariables, "environmentVariables") --- End diff -- What if there are no args or environment variables for a particular job? Is the caller expected to pass in an empty array? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r153352732 --- Diff: core/src/test/scala/org/apache/spark/deploy/rest/SubmitRestProtocolSuite.scala --- @@ -86,6 +86,8 @@ class SubmitRestProtocolSuite extends SparkFunSuite { message.clientSparkVersion = "1.2.3" message.appResource = "honey-walnut-cherry.jar" message.mainClass = "org.apache.spark.examples.SparkPie" +message.appArgs = Array("hdfs://tmp/auth") +message.environmentVariables = Map("SPARK_HOME" -> "/test") --- End diff -- There should be a check at the end of this test to make sure these fields show up in the `newMessage` object. (around L.125) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19543: [SPARK-19606][MESOS] Support constraints in spark...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19543#discussion_r150419093 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala --- @@ -122,4 +122,11 @@ package object config { "Example: key1:val1,key2:val2") .stringConf .createOptional + + private[spark] val DRIVER_CONSTRAINTS = +ConfigBuilder("spark.mesos.driver.constraints") + .doc("Attribute based constraints on mesos resource offers. Applied by the dispatcher " + +"when launching drivers. Default is to accept all offers with sufficient resources.") + .stringConf + .createWithDefault("") --- End diff -- @felixcheung I think it's okay. There's a utility function, `parseConstraintString`, which parses the input string into a Map. If the string is empty, it returns an empty Map: https://github.com/apache/spark/blob/fc45c2c88a838b8f46659ebad2a8f3a9923bc95f/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L321 This is also consistent with how the executor constraint string is parsed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19543 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19543: [SPARK-19606][MESOS] Support constraints in spark...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19543#discussion_r150100634 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -601,10 +602,14 @@ private[spark] class MesosClusterScheduler( for (submission <- candidates) { val driverCpu = submission.cores val driverMem = submission.mem - logTrace(s"Finding offer to launch driver with cpu: $driverCpu, mem: $driverMem") + val driverConstraints = + parseConstraintString(submission.conf.get("spark.mesos.driver.constraints", "")) --- End diff -- New config properties would ideally be defined in the `config` object: https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala#L24 and referenced by the variable name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19543: [SPARK-19606][MESOS] Support constraints in spark...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19543#discussion_r150097549 --- Diff: docs/running-on-mesos.md --- @@ -458,6 +461,14 @@ See the [configuration page](configuration.html) for information on Spark config + spark.mesos.driver.constraints + (none) + +Same as spark.mesos.constraints except applied to drivers when launched through the dispatcher. By default, +all resource offers will be accepted. --- End diff -- Maybe clarify in the description for `spark.mesos.constraints` in the above cell that it only applies to executors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19543: [SPARK-19606][MESOS] Support constraints in spark...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19543#discussion_r150101873 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -556,7 +556,8 @@ private[spark] class MesosClusterScheduler( private class ResourceOffer( val offer: Offer, - var remainingResources: JList[Resource]) { + var remainingResources: JList[Resource], + var attributes: JList[Attribute]) { override def toString(): String = { s"Offer id: ${offer.getId}, resources: ${remainingResources}" --- End diff -- Include the `attributes` in this string? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147429031 --- Diff: dev/deps/spark-deps-hadoop-2.6 --- @@ -138,7 +138,7 @@ lz4-java-1.4.0.jar machinist_2.11-0.6.1.jar macro-compat_2.11-1.1.1.jar mail-1.4.7.jar -mesos-1.3.0-shaded-protobuf.jar +mesos-1.4.0-shaded-protobuf.jar --- End diff -- OK, makes sense. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19510#discussion_r147265329 --- Diff: docs/running-on-mesos.md --- @@ -613,6 +621,39 @@ See the [configuration page](configuration.html) for information on Spark config driver disconnects, the master immediately tears down the framework. + + spark.mesos.rejectOfferDuration + 120s + +Time to consider unused resources refused, serves as a fallback of +`spark.mesos.rejectOfferDurationForUnmetConstraints`, +`spark.mesos.rejectOfferDurationForReachedMaxCores`, +`spark.mesos.rejectOfferDurationForReachedMaxMem` + + + + spark.mesos.rejectOfferDurationForUnmetConstraints + park.mesos.rejectOfferDuration + +Time to consider unused resources refused with unmet constraints + + + + spark.mesos.rejectOfferDurationForReachedMaxCores + park.mesos.rejectOfferDuration --- End diff -- Typo: 'park' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19510#discussion_r147265261 --- Diff: docs/running-on-mesos.md --- @@ -613,6 +621,39 @@ See the [configuration page](configuration.html) for information on Spark config driver disconnects, the master immediately tears down the framework. + + spark.mesos.rejectOfferDuration + 120s + +Time to consider unused resources refused, serves as a fallback of +`spark.mesos.rejectOfferDurationForUnmetConstraints`, +`spark.mesos.rejectOfferDurationForReachedMaxCores`, +`spark.mesos.rejectOfferDurationForReachedMaxMem` + + + + spark.mesos.rejectOfferDurationForUnmetConstraints + park.mesos.rejectOfferDuration --- End diff -- Typo: 'park' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19510#discussion_r147274681 --- Diff: docs/running-on-mesos.md --- @@ -344,6 +345,13 @@ See the [configuration page](configuration.html) for information on Spark config + spark.mem.max + (none) + +Maximum amount of memory Spark accepts from Mesos launching executor. --- End diff -- Maybe add "across the cluster (not from each machine)". And, something about there is no maximum if this property is not set. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19510#discussion_r147277408 --- Diff: docs/running-on-mesos.md --- @@ -613,6 +621,39 @@ See the [configuration page](configuration.html) for information on Spark config driver disconnects, the master immediately tears down the framework. + + spark.mesos.rejectOfferDuration + 120s + +Time to consider unused resources refused, serves as a fallback of +`spark.mesos.rejectOfferDurationForUnmetConstraints`, +`spark.mesos.rejectOfferDurationForReachedMaxCores`, +`spark.mesos.rejectOfferDurationForReachedMaxMem` + + + + spark.mesos.rejectOfferDurationForUnmetConstraints + park.mesos.rejectOfferDuration + +Time to consider unused resources refused with unmet constraints + + + + spark.mesos.rejectOfferDurationForReachedMaxCores + park.mesos.rejectOfferDuration + +Time to consider unused resources refused when maximum number of cores +spark.cores.max is reached --- End diff -- My suggestion: "Duration for which unused resources are considered declined, when maximum number of cores spark.cores.max has been reached." @ArtRand Is this the documentation you had in mind in https://issues.apache.org/jira/browse/SPARK-22133 ? Is this enough information for a non-Mesos expert to set this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19510#discussion_r147265373 --- Diff: docs/running-on-mesos.md --- @@ -613,6 +621,39 @@ See the [configuration page](configuration.html) for information on Spark config driver disconnects, the master immediately tears down the framework. + + spark.mesos.rejectOfferDuration + 120s + +Time to consider unused resources refused, serves as a fallback of +`spark.mesos.rejectOfferDurationForUnmetConstraints`, +`spark.mesos.rejectOfferDurationForReachedMaxCores`, +`spark.mesos.rejectOfferDurationForReachedMaxMem` + + + + spark.mesos.rejectOfferDurationForUnmetConstraints + park.mesos.rejectOfferDuration + +Time to consider unused resources refused with unmet constraints + + + + spark.mesos.rejectOfferDurationForReachedMaxCores + park.mesos.rejectOfferDuration + +Time to consider unused resources refused when maximum number of cores +spark.cores.max is reached + + + + spark.mesos.rejectOfferDurationForReachedMaxMem + park.mesos.rejectOfferDuration --- End diff -- Typo: 'park' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19510: [SPARK-22292][Mesos] Added spark.mem.max support for Mes...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19510 @windkit Trying to understand the need for this config ... could you accomplish the same thing by setting spark.cores.max, spark.executor.cores, and spark.executor.memory? Could you give an example scenario where this is needed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147261741 --- Diff: dev/deps/spark-deps-hadoop-2.6 --- @@ -138,7 +138,7 @@ lz4-java-1.4.0.jar machinist_2.11-0.6.1.jar macro-compat_2.11-1.1.1.jar mail-1.4.7.jar -mesos-1.3.0-shaded-protobuf.jar +mesos-1.4.0-shaded-protobuf.jar --- End diff -- @skonto Was there a specific reason for upgrading to 1.4? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147162791 --- Diff: docs/running-on-mesos.md --- @@ -485,39 +485,87 @@ See the [configuration page](configuration.html) for information on Spark config - spark.mesos.driver.secret.envkeys - (none) -A comma-separated list that, if set, the contents of the secret referenced -by spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be -set to the provided environment variable in the driver's process. +spark.mesos.driver.secret.values, +spark.mesos.driver.secret.names, +spark.mesos.executor.secret.values, +spark.mesos.executor.secret.names, - - -spark.mesos.driver.secret.filenames (none) -A comma-separated list that, if set, the contents of the secret referenced by -spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be -written to the provided file. Paths are relative to the container's work -directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +A secret is specified by its contents and destination. These properties +specify a secret's contents. To specify a secret's destination, see the cell below. + +You can specify a secret's contents either (1) by value or (2) by reference. +(1) To specify a secret by value, set the +spark.mesos.[driver|executor].secret.values +property, to make the secret available in the driver or executors. +For example, to make a secret password "guessme" available to the driver process, set: + +spark.mesos.driver.secret.values=guessme + +(2) To specify a secret that has been placed in a secret store --- End diff -- I added a note about this. The secret store has to be provided by the user. You can configure Mesos to integrate with a secret store via a custom Mesos module. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147165029 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +177,88 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo } + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAMES).getOrElse(Nil).map(s => createReferenceSecret(s)) + +val valueSecrets: Seq[Secret] = { + conf.get(secretConfig.SECRET_VALUES).getOrElse(Nil).map(s => createValueSecret(s)) +} + +if (valueSecrets.nonEmpty && referenceSecrets.nonEmpty) { + throw new SparkException("Cannot specify VALUE type secrets and REFERENCE types ones") --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147165077 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +177,88 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo } + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAMES).getOrElse(Nil).map(s => createReferenceSecret(s)) --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147161356 --- Diff: docs/running-on-mesos.md --- @@ -485,39 +485,87 @@ See the [configuration page](configuration.html) for information on Spark config - spark.mesos.driver.secret.envkeys - (none) -A comma-separated list that, if set, the contents of the secret referenced -by spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be -set to the provided environment variable in the driver's process. +spark.mesos.driver.secret.values, +spark.mesos.driver.secret.names, +spark.mesos.executor.secret.values, +spark.mesos.executor.secret.names, - - -spark.mesos.driver.secret.filenames (none) -A comma-separated list that, if set, the contents of the secret referenced by -spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be -written to the provided file. Paths are relative to the container's work -directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +A secret is specified by its contents and destination. These properties +specify a secret's contents. To specify a secret's destination, see the cell below. + +You can specify a secret's contents either (1) by value or (2) by reference. --- End diff -- Good catch. I added `` paragraph tags to separate paragraphs in this table cell. (Jekyll doesn't need them in normal text body, only inside a table cell.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147159788 --- Diff: docs/running-on-mesos.md --- @@ -485,39 +485,87 @@ See the [configuration page](configuration.html) for information on Spark config - spark.mesos.driver.secret.envkeys - (none) -A comma-separated list that, if set, the contents of the secret referenced -by spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be -set to the provided environment variable in the driver's process. +spark.mesos.driver.secret.values, --- End diff -- It looks okay in a browser. This column takes up about 25% of the page width. Here's a screenshot: ![documentation screenshot](https://user-images.githubusercontent.com/22622418/32058932-76e5e456-ba1f-11e7-999e-4302d49e636c.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146702191 --- Diff: docs/running-on-mesos.md --- @@ -501,23 +503,78 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be written to the provided file. Paths are relative to the container's work directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Note: File-based secrets require a custom +http://mesos.apache.org/documentation/latest/secrets/";>SecretResolver +module. Example: + +filename1,filename2 spark.mesos.driver.secret.names (none) A comma-separated list of secret references. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretname1,secretname2 spark.mesos.driver.secret.values (none) A comma-separated list of secret values. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretvalue1,secretvalue2 + + + + + spark.mesos.executor.secret.envkeys + (none) + +A comma-separated list that, if set, the contents of the secret referenced +by spark.mesos.executor.secret.names or spark.mesos.executor.secret.values will be +set to the provided environment variable in the executor's process. Example: + +ENVKEY1,ENVKEY2 + + --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146712534 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -122,7 +126,8 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { .toList } - def containerInfo(conf: SparkConf): ContainerInfo.Builder = { + def buildContainerInfo(conf: SparkConf): +ContainerInfo.Builder = { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146712337 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -440,6 +420,23 @@ private[spark] class MesosClusterScheduler( CommandInfo.URI.newBuilder().setValue(uri.trim()).setCache(useFetchCache).build()) } + private def getContainerInfo(desc: MesosDriverDescription): ContainerInfo.Builder = { +val containerInfo = MesosSchedulerBackendUtil.buildContainerInfo(desc.conf) + +MesosSchedulerBackendUtil.getSecretVolume(desc.conf, config.driverSecretConfig) + .foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146702357 --- Diff: docs/running-on-mesos.md --- @@ -501,23 +503,78 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be written to the provided file. Paths are relative to the container's work directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Note: File-based secrets require a custom +http://mesos.apache.org/documentation/latest/secrets/";>SecretResolver +module. Example: + +filename1,filename2 spark.mesos.driver.secret.names (none) A comma-separated list of secret references. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretname1,secretname2 spark.mesos.driver.secret.values (none) A comma-separated list of secret values. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretvalue1,secretvalue2 + + + + + spark.mesos.executor.secret.envkeys + (none) + +A comma-separated list that, if set, the contents of the secret referenced +by spark.mesos.executor.secret.names or spark.mesos.executor.secret.values will be +set to the provided environment variable in the executor's process. Example: + +ENVKEY1,ENVKEY2 + + + +spark.mesos.executor.secret.filenames --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146712170 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -394,9 +393,10 @@ private[spark] class MesosClusterScheduler( } // add secret environment variables -getSecretEnvVar(desc).foreach { variable => +MesosSchedulerBackendUtil.getSecretEnvVar(desc.conf, config.driverSecretConfig) + .foreach { variable => if (variable.getSecret.getReference.isInitialized) { --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146704074 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala --- @@ -21,6 +21,39 @@ import java.util.concurrent.TimeUnit import org.apache.spark.internal.config.ConfigBuilder +private[spark] class MesosSecretConfig(taskType: String) { --- End diff -- Done. I've moved this `MesosSecretConfig` class inside the `config` object, and made the constructor private to that object. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146712778 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +178,90 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo } + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAMES).getOrElse(Nil).map(s => createReferenceSecret(s)) + +val valueSecrets: Seq[Secret] = { + conf.get(secretConfig.SECRET_VALUES).getOrElse(Nil).map(s => createValueSecret(s)) +} + +if (valueSecrets.nonEmpty && referenceSecrets.nonEmpty) { + throw new SparkException("Cannot specify VALUE type secrets and REFERENCE types ones") +} + +if (referenceSecrets.nonEmpty) referenceSecrets else valueSecrets + } + + private def illegalSecretInput(dest: Seq[String], secrets: Seq[Secret]): Boolean = { +if (dest.nonEmpty) { + // make sure there is a one-to-one correspondence between destinations and secrets + if (dest.length != secrets.length) { +return true + } +} +false + } + + def getSecretVolume(conf: SparkConf, secretConfig: MesosSecretConfig): List[Volume] = { +val secrets = getSecrets(conf, secretConfig) +val secretPaths: Seq[String] = + conf.get(secretConfig.SECRET_FILENAMES).getOrElse(Nil) + +if (illegalSecretInput(secretPaths, secrets)) { + throw new SparkException( +s"Need to give equal numbers of secrets and file paths for file-based " + + s"reference secrets got secrets $secrets, and paths $secretPaths") +} + +secrets.zip(secretPaths).map { + case (s, p) => --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146701899 --- Diff: docs/running-on-mesos.md --- @@ -501,23 +503,78 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be written to the provided file. Paths are relative to the container's work directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Note: File-based secrets require a custom +http://mesos.apache.org/documentation/latest/secrets/";>SecretResolver +module. Example: + +filename1,filename2 spark.mesos.driver.secret.names (none) A comma-separated list of secret references. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretname1,secretname2 spark.mesos.driver.secret.values (none) A comma-separated list of secret values. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: --- End diff -- I have grouped some of these properties together to make the description clearer. Please let me know what you think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146702275 --- Diff: docs/running-on-mesos.md --- @@ -501,23 +503,78 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be written to the provided file. Paths are relative to the container's work directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Note: File-based secrets require a custom +http://mesos.apache.org/documentation/latest/secrets/";>SecretResolver +module. Example: + +filename1,filename2 spark.mesos.driver.secret.names (none) A comma-separated list of secret references. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretname1,secretname2 spark.mesos.driver.secret.values (none) A comma-separated list of secret values. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: + +secretvalue1,secretvalue2 + + + + + spark.mesos.executor.secret.envkeys + (none) + +A comma-separated list that, if set, the contents of the secret referenced --- End diff -- Let me try to improve the descriptions of all these secret configs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r146712893 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +178,90 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo } + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAMES).getOrElse(Nil).map(s => createReferenceSecret(s)) + +val valueSecrets: Seq[Secret] = { + conf.get(secretConfig.SECRET_VALUES).getOrElse(Nil).map(s => createValueSecret(s)) +} + +if (valueSecrets.nonEmpty && referenceSecrets.nonEmpty) { + throw new SparkException("Cannot specify VALUE type secrets and REFERENCE types ones") +} + +if (referenceSecrets.nonEmpty) referenceSecrets else valueSecrets + } + + private def illegalSecretInput(dest: Seq[String], secrets: Seq[Secret]): Boolean = { +if (dest.nonEmpty) { + // make sure there is a one-to-one correspondence between destinations and secrets + if (dest.length != secrets.length) { +return true + } +} +false + } + + def getSecretVolume(conf: SparkConf, secretConfig: MesosSecretConfig): List[Volume] = { +val secrets = getSecrets(conf, secretConfig) +val secretPaths: Seq[String] = + conf.get(secretConfig.SECRET_FILENAMES).getOrElse(Nil) + +if (illegalSecretInput(secretPaths, secrets)) { + throw new SparkException( +s"Need to give equal numbers of secrets and file paths for file-based " + + s"reference secrets got secrets $secrets, and paths $secretPaths") +} + +secrets.zip(secretPaths).map { + case (s, p) => +val source = Volume.Source.newBuilder() + .setType(Volume.Source.Type.SECRET) + .setSecret(s) +Volume.newBuilder() + .setContainerPath(p) + .setSource(source) + .setMode(Volume.Mode.RO) + .build +}.toList + } + + def getSecretEnvVar(conf: SparkConf, secretConfig: MesosSecretConfig): +List[Variable] = { +val secrets = getSecrets(conf, secretConfig) +val secretEnvKeys = conf.get(secretConfig.SECRET_ENVKEYS).getOrElse(Nil) +if (illegalSecretInput(secretEnvKeys, secrets)) { + throw new SparkException( +s"Need to give equal numbers of secrets and environment keys " + + s"for environment-based reference secrets got secrets $secrets, " + + s"and keys $secretEnvKeys") +} + +secrets.zip(secretEnvKeys).map { + case (s, k) => --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @srowen Ping, would you like to help review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @vanzin Ping, would you mind reviewing this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r145530482 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -446,6 +456,9 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( totalGpusAcquired += taskGPUs gpusByTaskId(taskId) = taskGPUs } +} else { + logDebug(s"Cannot launch a task for offer with id: $offerId on slave with id: $slaveId." + --- End diff -- Add a space at the end of this sentence (before the next sentence). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @srowen Would you like to help review? Adding Mesos secrets support in driver for executor tasks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145437791 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -122,7 +126,8 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { .toList } - def containerInfo(conf: SparkConf): ContainerInfo.Builder = { + def buildContainerInfo(conf: SparkConf): + ContainerInfo.Builder = { --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145437403 --- Diff: docs/running-on-mesos.md --- @@ -501,23 +503,74 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.driver.secret.names or spark.mesos.driver.secret.values will be written to the provided file. Paths are relative to the container's work directory. Absolute paths must already exist. Consult the Mesos Secret -protobuf for more information. +protobuf for more information. Example: --- End diff -- Mentioned custom module and added link to Mesos documentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145437968 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +178,90 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo } + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAMES).getOrElse(Nil).map(s => createReferenceSecret(s)) + +val valueSecrets: Seq[Secret] = { + conf.get(secretConfig.SECRET_VALUES).getOrElse(Nil).map(s => createValueSecret(s)) +} + +if (valueSecrets.nonEmpty && referenceSecrets.nonEmpty) { + throw new SparkException("Cannot specify VALUE type secrets and REFERENCE types ones") +} + +if (referenceSecrets.nonEmpty) referenceSecrets else valueSecrets + } + + private def illegalSecretInput(dest: Seq[String], secrets: Seq[Secret]): Boolean = { +if (dest.nonEmpty) { + // make sure there is a one-to-one correspondence between destinations and secrets + if (dest.length != secrets.length) { +return true + } +} +false + } + + def getSecretVolume(conf: SparkConf, secretConfig: MesosSecretConfig): List[Volume] = { +val secrets = getSecrets(conf, secretConfig) +val secretPaths: Seq[String] = + conf.get(secretConfig.SECRET_FILENAMES).getOrElse(Nil) + +if (illegalSecretInput(secretPaths, secrets)) { + throw new SparkException( +s"Need to give equal numbers of secrets and file paths for file-based " + + s"reference secrets got secrets $secrets, and paths $secretPaths") +} + +secrets.zip(secretPaths).map { + case (s, p) => +val source = Volume.Source.newBuilder() + .setType(Volume.Source.Type.SECRET) + .setSecret(s) +Volume.newBuilder() + .setContainerPath(p) + .setSource(source) + .setMode(Volume.Mode.RO) + .build +}.toList + } + + def getSecretEnvVar(conf: SparkConf, secretConfig: MesosSecretConfig): + List[Variable] = { --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145287767 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -170,9 +175,119 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo.addNetworkInfos(info) } +getSecretVolume(conf, secretConfig).foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${volume.getSource.getSecret.getReference.getName}" + + s"on file ${volume.getContainerPath}") + } else { +logInfo(s"Setting secret on file name=${volume.getContainerPath}") + } + containerInfo.addVolumes(volume) +} + containerInfo } + def addSecretEnvVar( --- End diff -- I've removed this method. To be more consistent, I've moved this code back into MesosClusterScheduler. There's a little duplication, because MesosCoarseGrainedSchedulerBackend now has a similar code snippet, but it does avoid the mutation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145290625 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -170,9 +175,119 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo.addNetworkInfos(info) } +getSecretVolume(conf, secretConfig).foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${volume.getSource.getSecret.getReference.getName}" + + s"on file ${volume.getContainerPath}") + } else { +logInfo(s"Setting secret on file name=${volume.getContainerPath}") + } + containerInfo.addVolumes(volume) +} + containerInfo } + def addSecretEnvVar( + envBuilder: Environment.Builder, + conf: SparkConf, + secretConfig: MesosSecretConfig): Unit = { +getSecretEnvVar(conf, secretConfig).foreach { variable => + if (variable.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${variable.getSecret.getReference.getName}" + --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145290603 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -170,9 +175,119 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo.addNetworkInfos(info) } +getSecretVolume(conf, secretConfig).foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${volume.getSource.getSecret.getReference.getName}" + + s"on file ${volume.getContainerPath}") + } else { +logInfo(s"Setting secret on file name=${volume.getContainerPath}") + } + containerInfo.addVolumes(volume) +} + containerInfo } + def addSecretEnvVar( + envBuilder: Environment.Builder, + conf: SparkConf, + secretConfig: MesosSecretConfig): Unit = { +getSecretEnvVar(conf, secretConfig).foreach { variable => + if (variable.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${variable.getSecret.getReference.getName}" + + s"on file ${variable.getName}") + } else { +logInfo(s"Setting secret on environment variable name=${variable.getName}") + } + envBuilder.addVariables(variable) +} + } + + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): + Seq[Secret] = { --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145282997 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -122,7 +126,8 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { .toList } - def containerInfo(conf: SparkConf): ContainerInfo.Builder = { + def buildContainerInfo(conf: SparkConf, secretConfig: MesosSecretConfig): --- End diff -- I've removed that second parameter and moved the secret stuff out of this method, since it's called by Fine-Grained mode, and we don't want to touch Fine-Grained mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145286357 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -170,9 +175,119 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo.addNetworkInfos(info) } +getSecretVolume(conf, secretConfig).foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${volume.getSource.getSecret.getReference.getName}" + --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145268821 --- Diff: docs/running-on-mesos.md --- @@ -522,6 +522,43 @@ See the [configuration page](configuration.html) for information on Spark config + spark.mesos.executor.secret.envkeys + (none) + +A comma-separated list that, if set, the contents of the secret referenced --- End diff -- I'll add some examples. I didn't find a good general example; I added a separate example to each config option. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r145282300 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -159,7 +160,8 @@ private[spark] class MesosFineGrainedSchedulerBackend( .setCommand(command) .setData(ByteString.copyFrom(createExecArg())) - executorInfo.setContainer(MesosSchedulerBackendUtil.containerInfo(sc.conf)) +executorInfo.setContainer( --- End diff -- Okay, I've reverted this change in fine-grained mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19374: [SPARK-22145][MESOS] fix supervise with checkpointing on...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19374 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @vanzin Would you mind reviewing this PR? A followup to ArtRand's secrets PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @skonto I haven't tested with TLS; I'll work on that in the next couple of days. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143571471 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/Utils.scala --- @@ -105,4 +108,108 @@ object Utils { def createTaskId(taskId: String): TaskID = { TaskID.newBuilder().setValue(taskId).build() } + + def configEnvBasedRefSecrets(secretConfig: MesosSecretConfig): Map[String, String] = { +val secretName = "/path/to/secret,/anothersecret" +val envKey = "SECRET_ENV_KEY,PASSWORD" +Map( + secretConfig.SECRET_NAME.key -> secretName, + secretConfig.SECRET_ENVKEY.key -> envKey +) + } + + def verifyEnvBasedRefSecrets(launchedTasks: List[TaskInfo]): Unit = { +val envVars = launchedTasks.head + .getCommand + .getEnvironment + .getVariablesList + .asScala +assert(envVars + .filter(!_.getName.startsWith("SPARK_")).length == 2) // user-defined secret env vars +val variableOne = envVars.filter(_.getName == "SECRET_ENV_KEY").head +assert(variableOne.getSecret.isInitialized) +assert(variableOne.getSecret.getType == Secret.Type.REFERENCE) +assert(variableOne.getSecret.getReference.getName == "/path/to/secret") +assert(variableOne.getType == Environment.Variable.Type.SECRET) +val variableTwo = envVars.filter(_.getName == "PASSWORD").head +assert(variableTwo.getSecret.isInitialized) +assert(variableTwo.getSecret.getType == Secret.Type.REFERENCE) +assert(variableTwo.getSecret.getReference.getName == "/anothersecret") +assert(variableTwo.getType == Environment.Variable.Type.SECRET) + } + + def configEnvBasedValueSecrets(secretConfig: MesosSecretConfig): Map[String, String] = { +val secretValues = "user,password" +val envKeys = "USER,PASSWORD" +Map( + secretConfig.SECRET_VALUE.key -> secretValues, + secretConfig.SECRET_ENVKEY.key -> envKeys +) + } + + def verifyEnvBasedValueSecrets(launchedTasks: List[TaskInfo]): Unit = { +val envVars = launchedTasks.head + .getCommand + .getEnvironment + .getVariablesList + .asScala +assert(envVars + .filter(!_.getName.startsWith("SPARK_")).length == 2) // user-defined secret env vars --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143565549 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -170,9 +174,122 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo.addNetworkInfos(info) } +getSecretVolume(conf, secretConfig).foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${volume.getSource.getSecret.getReference.getName}" + + s"on file ${volume.getContainerPath}") + } else { +logInfo(s"Setting secret on file name=${volume.getContainerPath}") + } + containerInfo.addVolumes(volume) +} + containerInfo } + def addSecretEnvVar( + envBuilder: Environment.Builder, + conf: SparkConf, + secretConfig: MesosSecretConfig): Unit = { +getSecretEnvVar(conf, secretConfig).foreach { variable => + if (variable.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${variable.getSecret.getReference.getName}" + + s"on file ${variable.getName}") + } else { +logInfo(s"Setting secret on environment variable name=${variable.getName}") + } + envBuilder.addVariables(variable) +} + } + + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): + Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAME).getOrElse(Nil).map(s => createReferenceSecret(s)) + +val valueSecrets: Seq[Secret] = { + conf.get(secretConfig.SECRET_VALUE).getOrElse(Nil).map(s => createValueSecret(s)) +} + +if (valueSecrets.nonEmpty && referenceSecrets.nonEmpty) { + throw new SparkException("Cannot specify VALUE type secrets and REFERENCE types ones") +} + +if (referenceSecrets.nonEmpty) referenceSecrets else valueSecrets + } + + private def illegalSecretInput(dest: Seq[String], s: Seq[Secret]): Boolean = { +if (dest.isEmpty) { // no destination set (ie not using secrets of this type --- End diff -- Good point. If they specify paths but no secrets, it should throw an exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143543546 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala --- @@ -21,6 +21,39 @@ import java.util.concurrent.TimeUnit import org.apache.spark.internal.config.ConfigBuilder +private[spark] class MesosSecretConfig(taskType: String) { + private[spark] val SECRET_NAME = --- End diff -- I see what you mean. I'll change them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143540778 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/Utils.scala --- @@ -105,4 +108,108 @@ object Utils { def createTaskId(taskId: String): TaskID = { TaskID.newBuilder().setValue(taskId).build() } + + def configEnvBasedRefSecrets(secretConfig: MesosSecretConfig): Map[String, String] = { +val secretName = "/path/to/secret,/anothersecret" +val envKey = "SECRET_ENV_KEY,PASSWORD" +Map( + secretConfig.SECRET_NAME.key -> secretName, + secretConfig.SECRET_ENVKEY.key -> envKey +) + } + + def verifyEnvBasedRefSecrets(launchedTasks: List[TaskInfo]): Unit = { +val envVars = launchedTasks.head + .getCommand + .getEnvironment + .getVariablesList + .asScala +assert(envVars + .filter(!_.getName.startsWith("SPARK_")).length == 2) // user-defined secret env vars --- End diff -- Thanks, I'll fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143548722 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -122,7 +126,7 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { .toList } - def containerInfo(conf: SparkConf): ContainerInfo.Builder = { + def containerInfo(conf: SparkConf, secretConfig: MesosSecretConfig): ContainerInfo.Builder = { --- End diff -- OK, I'll use `buildContainerInfo`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143549493 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -170,9 +174,122 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging { containerInfo.addNetworkInfos(info) } +getSecretVolume(conf, secretConfig).foreach { volume => + if (volume.getSource.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${volume.getSource.getSecret.getReference.getName}" + + s"on file ${volume.getContainerPath}") + } else { +logInfo(s"Setting secret on file name=${volume.getContainerPath}") + } + containerInfo.addVolumes(volume) +} + containerInfo } + def addSecretEnvVar( + envBuilder: Environment.Builder, + conf: SparkConf, + secretConfig: MesosSecretConfig): Unit = { +getSecretEnvVar(conf, secretConfig).foreach { variable => + if (variable.getSecret.getReference.isInitialized) { +logInfo(s"Setting reference secret ${variable.getSecret.getReference.getName}" + + s"on file ${variable.getName}") + } else { +logInfo(s"Setting secret on environment variable name=${variable.getName}") + } + envBuilder.addVariables(variable) +} + } + + private def getSecrets(conf: SparkConf, secretConfig: MesosSecretConfig): + Seq[Secret] = { +def createValueSecret(data: String): Secret = { + Secret.newBuilder() +.setType(Secret.Type.VALUE) + .setValue(Secret.Value.newBuilder().setData(ByteString.copyFrom(data.getBytes))) +.build() +} + +def createReferenceSecret(name: String): Secret = { + Secret.newBuilder() +.setReference(Secret.Reference.newBuilder().setName(name)) +.setType(Secret.Type.REFERENCE) +.build() +} + +val referenceSecrets: Seq[Secret] = + conf.get(secretConfig.SECRET_NAME).getOrElse(Nil).map(s => createReferenceSecret(s)) + +val valueSecrets: Seq[Secret] = { + conf.get(secretConfig.SECRET_VALUE).getOrElse(Nil).map(s => createValueSecret(s)) +} + +if (valueSecrets.nonEmpty && referenceSecrets.nonEmpty) { + throw new SparkException("Cannot specify VALUE type secrets and REFERENCE types ones") +} + +if (referenceSecrets.nonEmpty) referenceSecrets else valueSecrets + } + + private def illegalSecretInput(dest: Seq[String], s: Seq[Secret]): Boolean = { --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143546897 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -17,10 +17,14 @@ package org.apache.spark.scheduler.cluster.mesos -import org.apache.mesos.Protos.{ContainerInfo, Image, NetworkInfo, Parameter, Volume} +import org.apache.mesos.Protos._ --- End diff -- Hmm, IntelliJ added this automatically - I'll fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19374: [SPARK-22145][MESOS] fix supervise with checkpointing on...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19374 @skonto One more question: in your screen shot of the History Server, I noticed the "Completed" time is 1969-12-31 for all the drivers (the original one, retry-1, and retry-2). Is that to be expected? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19374: [SPARK-22145][MESOS] fix supervise with checkpoin...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19374#discussion_r143344688 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -374,6 +375,15 @@ private[spark] class MesosClusterScheduler( s"${frameworkId}-${desc.submissionId}${retries}" } + private def getDriverTaskId(desc: MesosDriverDescription): String = { +val sId = desc.submissionId +desc.retryState.map(state => sId + s"-retry-${state.retries.toString}").getOrElse(sId) + } + + private def getSumbmissionIdFromTaskId(taskId: String): String = { --- End diff -- typo: "Submission" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19374: [SPARK-22145][MESOS] fix supervise with checkpoin...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19374#discussion_r143484031 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -804,45 +814,52 @@ private[spark] class MesosClusterScheduler( logInfo(s"Received status update: taskId=${taskId}" + s" state=${status.getState}" + s" message=${status.getMessage}" + - s" reason=${status.getReason}"); + s" reason=${status.getReason}") stateLock.synchronized { - if (launchedDrivers.contains(taskId)) { + val subId = getSumbmissionIdFromTaskId(taskId) + if (launchedDrivers.contains(subId)) { if (status.getReason == Reason.REASON_RECONCILIATION && !pendingRecover.contains(taskId)) { // Task has already received update and no longer requires reconciliation. return } -val state = launchedDrivers(taskId) +val state = launchedDrivers(subId) // Check if the driver is supervise enabled and can be relaunched. if (state.driverDescription.supervise && shouldRelaunch(status.getState)) { - removeFromLaunchedDrivers(taskId) + removeFromLaunchedDrivers(subId) state.finishDate = Some(new Date()) val retryState: Option[MesosClusterRetryState] = state.driverDescription.retryState val (retries, waitTimeSec) = retryState .map { rs => (rs.retries + 1, Math.min(maxRetryWaitTime, rs.waitTime * 2)) } .getOrElse{ (1, 1) } val nextRetry = new Date(new Date().getTime + waitTimeSec * 1000L) - val newDriverDescription = state.driverDescription.copy( retryState = Some(new MesosClusterRetryState(status, retries, nextRetry, waitTimeSec))) - addDriverToPending(newDriverDescription, taskId); + addDriverToPending(newDriverDescription, newDriverDescription.submissionId) } else if (TaskState.isFinished(mesosToTaskState(status.getState))) { - removeFromLaunchedDrivers(taskId) - state.finishDate = Some(new Date()) - if (finishedDrivers.size >= retainedDrivers) { -val toRemove = math.max(retainedDrivers / 10, 1) -finishedDrivers.trimStart(toRemove) - } - finishedDrivers += state + retireDriver(subId, state, status) } state.mesosTaskStatus = Option(status) } else { -logError(s"Unable to find driver $taskId in status update") +logError(s"Unable to find driver with $taskId in status update") } } } + private def retireDriver( + submissionId: String, + state: MesosClusterSubmissionState, + status: TaskStatus) = { --- End diff -- This parameter, `status`, is not used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19374: [SPARK-22145][MESOS] fix supervise with checkpoin...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19374#discussion_r143487275 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -276,8 +276,8 @@ private[spark] class MesosClusterScheduler( private def recoverState(): Unit = { stateLock.synchronized { launchedDriversState.fetchAll[MesosClusterSubmissionState]().foreach { state => -launchedDrivers(state.taskId.getValue) = state -pendingRecover(state.taskId.getValue) = state.slaveId +launchedDrivers(state.driverDescription.submissionId) = state --- End diff -- Maybe modify the comments up in L.138-150 ^^ to clarify which data structures are keyed by submission ID vs. task ID: - Keyed by submission ID: `launchedDrivers`, `queuedDrivers`, `pendingRetryDrivers`, `finishedDrivers` - Keyed by task ID: `pendingRecover` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19374: [SPARK-22145][MESOS] fix supervise with checkpoin...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19374#discussion_r143361887 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -276,8 +276,8 @@ private[spark] class MesosClusterScheduler( private def recoverState(): Unit = { stateLock.synchronized { launchedDriversState.fetchAll[MesosClusterSubmissionState]().foreach { state => -launchedDrivers(state.taskId.getValue) = state -pendingRecover(state.taskId.getValue) = state.slaveId +launchedDrivers(state.driverDescription.submissionId) = state +pendingRecover(state.taskId.toString) = state.slaveId --- End diff -- Why did this change? I think the original `getValue` is the standard way to get the value of this TaskID proto. http://mesos.apache.org/api/latest/java/org/apache/mesos/Protos.TaskID.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @ArtRand @skonto Please review. Tests passed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets
GitHub user susanxhuynh opened a pull request: https://github.com/apache/spark/pull/19437 [SPARK-22131][MESOS] Mesos driver secrets ## Background In #18837 , @ArtRand added Mesos secrets support to the dispatcher. **This PR is to add the same secrets support to the drivers.** This means if the secret configs are set, the driver will launch executors that have access to either env or file-based secrets. One use case for this is to support TLS in the driver <=> executor communication. ## What changes were proposed in this pull request? Most of the changes are a refactor of the dispatcher secrets support (#18837) - moving it to a common place that can be used by both the dispatcher and drivers. The same goes for the unit tests. ## How was this patch tested? There are four config combinations: [env or file-based] x [value or reference secret]. For each combination: - Added a unit test. - Tested in DC/OS. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mesosphere/spark sh-mesos-driver-secret Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19437.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19437 commit b289bcc95f0b67cda94ddf416fc9a15e5d1855b4 Author: Susan X. Huynh Date: 2017-10-04T11:30:31Z [SPARK-22131] Mesos driver secrets. The driver launches executors that have access to env or file-based secrets. commit 6f062c00f6382d266619b4a56a753ec27d1db10b Author: Susan X. Huynh Date: 2017-10-05T12:07:20Z [SPARK-22131] Updated docs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19428: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh closed the pull request at: https://github.com/apache/spark/pull/19428 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19428: [SPARK-22131][MESOS] Mesos driver secrets
GitHub user susanxhuynh opened a pull request: https://github.com/apache/spark/pull/19428 [SPARK-22131][MESOS] Mesos driver secrets ## What changes were proposed in this pull request? The driver launches executors that have access to env or file-based secrets. Most of the changes are a refactor of the `dispatcher` secrets support - placing it in a common place that can be used by both the dispatcher and drivers. The same goes for the unit tests. ## How was this patch tested? Unit tests. Tested in DC/OS. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mesosphere/spark sh-mesos-driver-secret Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19428.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19428 commit 4b7ae0a8e802f8ef6a159f3ce95b3203352b548f Author: Susan X. Huynh Date: 2017-10-04T11:30:31Z [SPARK-22131] Mesos driver secrets. The driver launches executors that have access to env or file-based secrets. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r140117253 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala --- @@ -63,7 +63,8 @@ class MesosCredentialRenewer( def scheduleTokenRenewal(): Unit = { def scheduleRenewal(runnable: Runnable): Unit = { - val remainingTime = timeOfNextRenewal - System.currentTimeMillis() + // val remainingTime = timeOfNextRenewal - System.currentTimeMillis() + val remainingTime = 5000 --- End diff -- Why 5000? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r140117055 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -198,16 +198,19 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix) ) -if (principal != null) { +// check that the credentials are defined, even though it's likely that auth would have failed +// already if you've made it this far +if (principal != null && hadoopDelegationCreds.isDefined) { logDebug(s"Principal found ($principal) starting token renewer") val credentialRenewerThread = new Thread { setName("MesosCredentialRenewer") override def run(): Unit = { + val dummy: Option[Array[Byte]] = None --- End diff -- What is this for? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r139772286 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.security.PrivilegedExceptionAction +import java.util.concurrent.{Executors, TimeUnit} + +import scala.collection.JavaConverters._ +import scala.util.Try + +import org.apache.hadoop.security.UserGroupInformation + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging +import org.apache.spark.rpc.RpcEndpointRef +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.UpdateDelegationTokens +import org.apache.spark.util.ThreadUtils + + +class MesosCredentialRenewer( +conf: SparkConf, +tokenManager: HadoopDelegationTokenManager, +nextRenewal: Long, +de: RpcEndpointRef) extends Logging { + private val credentialRenewerThread = +Executors.newSingleThreadScheduledExecutor( + ThreadUtils.namedThreadFactory("Credential Refresh Thread")) + + @volatile private var timeOfNextRenewal = nextRenewal + + private val principal = conf.get("spark.yarn.principal") + + private val (secretFile, mode) = getSecretFile(conf) + + private def getSecretFile(conf: SparkConf): (String, String) = { +val keytab64 = conf.get("spark.yarn.keytab", null) +val tgt64 = System.getenv("KRB5CCNAME") +require(keytab64 != null || tgt64 != null, "keytab or tgt required") +require(keytab64 == null || tgt64 == null, "keytab and tgt cannot be used at the same time") +val mode = if (keytab64 != null) "keytab" else "tgt" +val secretFile = if (keytab64 != null) keytab64 else tgt64 +logInfo(s"Logging in as $principal with mode $mode to retrieve HDFS delegation tokens") +logDebug(s"secretFile is $secretFile") +(secretFile, mode) + } + + def scheduleTokenRenewal(): Unit = { +def scheduleRenewal(runnable: Runnable): Unit = { + val remainingTime = timeOfNextRenewal - System.currentTimeMillis() + if (remainingTime <= 0) { +logInfo("Credentials have expired, creating new ones now.") +runnable.run() + } else { +logInfo(s"Scheduling login from keytab in $remainingTime millis.") +credentialRenewerThread.schedule(runnable, remainingTime, TimeUnit.MILLISECONDS) + } +} + +val credentialRenewerRunnable = + new Runnable { +override def run(): Unit = { + try { +val creds = getRenewedDelegationTokens(conf) +broadcastDelegationTokens(creds) + } catch { +case e: Exception => + // Log the error and try to write new tokens back in an hour --- End diff -- Comment says "an hour" but code has 20 seconds. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r139726573 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.security.PrivilegedExceptionAction +import java.util.concurrent.{Executors, TimeUnit} + +import scala.collection.JavaConverters._ +import scala.util.Try + +import org.apache.hadoop.security.UserGroupInformation + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging +import org.apache.spark.rpc.RpcEndpointRef +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.UpdateDelegationTokens +import org.apache.spark.util.ThreadUtils + + +class MesosCredentialRenewer( +conf: SparkConf, +tokenManager: HadoopDelegationTokenManager, +nextRenewal: Long, +de: RpcEndpointRef) extends Logging { + private val credentialRenewerThread = +Executors.newSingleThreadScheduledExecutor( + ThreadUtils.namedThreadFactory("Credential Refresh Thread")) + + @volatile private var timeOfNextRenewal = nextRenewal + + private val principal = conf.get("spark.yarn.principal") + + private val (secretFile, mode) = getSecretFile(conf) + + private def getSecretFile(conf: SparkConf): (String, String) = { +val keytab64 = conf.get("spark.yarn.keytab", null) +val tgt64 = System.getenv("KRB5CCNAME") +require(keytab64 != null || tgt64 != null, "keytab or tgt required") +require(keytab64 == null || tgt64 == null, "keytab and tgt cannot be used at the same time") +val mode = if (keytab64 != null) "keytab" else "tgt" +val secretFile = if (keytab64 != null) keytab64 else tgt64 +logInfo(s"Logging in as $principal with mode $mode to retrieve HDFS delegation tokens") +logDebug(s"secretFile is $secretFile") +(secretFile, mode) + } + + def scheduleTokenRenewal(): Unit = { +def scheduleRenewal(runnable: Runnable): Unit = { + val remainingTime = timeOfNextRenewal - System.currentTimeMillis() + if (remainingTime <= 0) { +logInfo("Credentials have expired, creating new ones now.") +runnable.run() + } else { +logInfo(s"Scheduling login from keytab in $remainingTime millis.") +credentialRenewerThread.schedule(runnable, remainingTime, TimeUnit.MILLISECONDS) + } +} + +val credentialRenewerRunnable = + new Runnable { +override def run(): Unit = { + try { +val creds = getRenewedDelegationTokens(conf) +broadcastDelegationTokens(creds) + } catch { +case e: Exception => + // Log the error and try to write new tokens back in an hour + logWarning("Couldn't broadcast tokens, trying agin in 20 seconds", e) --- End diff -- (sp) "ag**a**in" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r139779444 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -159,6 +159,13 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp scheduler.getExecutorsAliveOnHost(host).foreach { exec => killExecutors(exec.toSeq, replace = true, force = true) } + + case UpdateDelegationTokens(tokens) => +logDebug("Asking each executor to update HDFS delegation tokens") +for ((x, executorData) <- executorDataMap) { --- End diff -- `(_, executorData)` would be more Scala-like. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18910: [SPARK-21694][MESOS] Support Mesos CNI network labels
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/18910 @srowen Thanks for reviewing. I've removed those spaces; please take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18910: [SPARK-21694][MESOS] Support Mesos CNI network labels
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/18910 @srowen Sean, would you like to review this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18910: [SPARK-21694][MESOS] Support Mesos CNI network labels
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/18910 @vanzin Would you mind helping review this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18910: [SPARK-21694][MESOS] Support Mesos CNI network labels
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/18910 @skonto @ArtRand Thanks for the feedback. I have fixed the documentation and added NETWORK_NAME to the config object. Please let me know what you think. @skonto I have not tested this particular change on a real CNI network. I think the only difference is that a Spark job runs on a different network, but there's no change in the Spark functionality. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18910: [SPARK-21694][MESOS] Support Mesos CNI network la...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/18910#discussion_r132766509 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala --- @@ -582,6 +583,10 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite val networkInfos = launchedTasks.head.getContainer.getNetworkInfosList assert(networkInfos.size == 1) assert(networkInfos.get(0).getName == "test-network-name") +assert(networkInfos.get(0).getLabels.getLabels(0).getKey == "key1") +assert(networkInfos.get(0).getLabels.getLabels(0).getValue == "val1") +assert(networkInfos.get(0).getLabels.getLabels(1).getKey == "key2") +assert(networkInfos.get(0).getLabels.getLabels(1).getValue == "val2") --- End diff -- @ArtRand I believe this _is_ testing the worker tasks: (L.572) the `setBackend()` method creates a MesosCoarseGrainedSchedulerBackend, L.577-579 creates the requirements for an Executor, (L. 580) backend.resourceOffers() looks for an offer that matches the requirements and launches an Executor task. There is a separate unit test above, in MesosClusterSchedulerSuite.scala, which checks for the labels in the driver itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18910: [SPARK-21694][MESOS] Support Mesos CNI network la...
GitHub user susanxhuynh opened a pull request: https://github.com/apache/spark/pull/18910 [SPARK-21694][MESOS] Support Mesos CNI network labels JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21694 ## What changes were proposed in this pull request? Spark already supports launching containers attached to a given CNI network by specifying it via the config `spark.mesos.network.name`. This PR adds support to pass in network labels to CNI plugins via a new config option `spark.mesos.network.labels`. These network labels are key-value pairs that are set in the `NetworkInfo` of both the driver and executor tasks. More details in the related Mesos documentation: http://mesos.apache.org/documentation/latest/cni/#mesos-meta-data-to-cni-plugins ## How was this patch tested? Unit tests, for both driver and executor tasks. Manual integration test to submit a job with the `spark.mesos.network.labels` option, hit the mesos/state.json endpoint, and check that the labels are set in the driver and executor tasks. @ArtRand @skonto You can merge this pull request into a Git repository by running: $ git pull https://github.com/mesosphere/spark sh-mesos-cni-labels Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18910.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18910 commit 255e5195a414f0f9e609cb579a2e8fdfea200be1 Author: Susan X. Huynh Date: 2017-08-10T18:19:51Z [SPARK-21694][MESOS] Allow the user to pass network labels to CNI plugins via --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18837: [Spark-20812][Mesos] Add secrets support to the d...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/18837#discussion_r132008642 --- Diff: docs/running-on-mesos.md --- @@ -479,6 +479,35 @@ See the [configuration page](configuration.html) for information on Spark config + spark.mesos.driver.secret.envkey + (none) + +If set, the contents of the secret referenced by +spark.mesos.driver.secret.name will be written to the provided +environment variable in the driver's process. + + + +spark.mesos.driver.secret.filename + (none) + +If set, the contents of the secret referenced by +spark.mesos.driver.secret.name will be written to the provided +file. Relative paths are relative to the container's work +directory. Absolute paths must already exist. Consult the Mesos Secret +protobuf for more information. + + + + spark.mesos.driver.secret.name + (none) + +Set the secret's reference name. Consult the Mesos Secret +protobuf for more information. + + + --- End diff -- Document `spark.mesos.driver.secret.value` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18837: [Spark-20812][Mesos] Add secrets support to the d...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/18837#discussion_r132009351 --- Diff: resource-managers/mesos/pom.xml --- @@ -29,7 +29,7 @@ Spark Project Mesos mesos -1.0.0 +1.3.0 shaded-protobuf --- End diff -- https://spark.apache.org/docs/latest/running-on-mesos.html#installing-mesos mentions that Spark is designed for use with Mesos 1.0.0+. Is that still true? Do these changes affect users who have clusters that are < 1.3.0? Also, how much of this change requires 1.4? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18837: [Spark-20812][Mesos] Add secrets support to the d...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/18837#discussion_r132040995 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -529,18 +570,120 @@ private[spark] class MesosClusterScheduler( val appName = desc.conf.get("spark.app.name") +val driverLabels = MesosProtoUtils.mesosLabels(desc.conf.get(config.DRIVER_LABELS) + .getOrElse("")) + TaskInfo.newBuilder() .setTaskId(taskId) .setName(s"Driver for ${appName}") .setSlaveId(offer.offer.getSlaveId) .setCommand(buildDriverCommand(desc)) + .setContainer(getContainerInfo(desc)) .addAllResources(cpuResourcesToUse.asJava) .addAllResources(memResourcesToUse.asJava) - .setLabels(MesosProtoUtils.mesosLabels(desc.conf.get(config.DRIVER_LABELS).getOrElse(""))) - .setContainer(MesosSchedulerBackendUtil.containerInfo(desc.conf)) + .setLabels(driverLabels) .build } + private def getContainerInfo(desc: MesosDriverDescription): ContainerInfo.Builder = { +val containerInfo = MesosSchedulerBackendUtil.containerInfo(desc.conf) + +getSecretVolume(desc).foreach { volume => + logInfo(s"Setting secret name=${volume.getSource.getSecret.getReference.getName} " + --- End diff -- Is this assuming the type is `Reference` when it could be either `Reference` or `Value`? Same question for the logInfo() for env vars above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18837: [Spark-20812][Mesos] Add secrets support to the d...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/18837#discussion_r132009978 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala --- @@ -58,12 +58,39 @@ package object config { private [spark] val DRIVER_LABELS = ConfigBuilder("spark.mesos.driver.labels") - .doc("Mesos labels to add to the driver. Labels are free-form key-value pairs. Key-value " + + .doc("Mesos labels to add to the driver. Labels are free-form key-value pairs. Key-value" + --- End diff -- Missing a space after "Key-value" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org