[GitHub] spark issue #23254: [SPARK-26304][SS] Add default value to spark.kafka.sasl....
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23254 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23252 (Also, another nit, Spark authentication is not necessarily SASL-based.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23221 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r239903144 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- > I think this enhancement does not apply to client mode If you mean "client mode inside a k8s-managed docker container", then yes, you may need to do extra stuff, like mount the appropriate credentials. But in the "client mode with driver inside k8s pod" case, Spark does not create that pod for you. So I'm not sure how Spark can help with anything there; the `serviceName` configuration seems targeted at propagating the credentials of the submitter to the driver pod, and in that case Spark is not creating the driver pod at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239898432 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -380,6 +400,12 @@ private[spark] class SecurityManager( } } + private def readSecretFile(secretFilePath: String): String = { +val secretFile = new File(secretFilePath) +require(secretFile.isFile, s"No file found containing the secret key at $secretFilePath.") +Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath)) --- End diff -- It would be good to validate that this is at least not empty; optionally if it at least contain the number of expected bytes (there's a config for that). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239899856 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -112,12 +112,14 @@ private[spark] class BasicExecutorFeatureStep( .build()) .build()) } ++ { -Option(secMgr.getSecretKey()).map { authSecret => - new EnvVarBuilder() -.withName(SecurityManager.ENV_AUTH_SECRET) -.withValue(authSecret) -.build() -} +Option(secMgr.getSecretKey()) --- End diff -- ``` if (!kubernetesConf.contains(...)) { // existing code } else { None } ``` I really don't like trying to mix different concerns in the same call chain... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r23991 --- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala --- @@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with ResetSystemProperties { intercept[IllegalArgumentException] { mgr.getSecretKey() } + case FILE => --- End diff -- add empty line before --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239900712 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -24,7 +24,7 @@ import org.apache.spark.{SecurityManager, SparkConf, SparkException} import org.apache.spark.deploy.k8s._ import org.apache.spark.deploy.k8s.Config._ import org.apache.spark.deploy.k8s.Constants._ -import org.apache.spark.internal.config.{EXECUTOR_CLASS_PATH, EXECUTOR_JAVA_OPTIONS, EXECUTOR_MEMORY, EXECUTOR_MEMORY_OVERHEAD, PYSPARK_EXECUTOR_MEMORY} +import org.apache.spark.internal.config.{AUTH_SECRET_FILE_EXECUTOR, EXECUTOR_CLASS_PATH, EXECUTOR_JAVA_OPTIONS, EXECUTOR_MEMORY, EXECUTOR_MEMORY_OVERHEAD, PYSPARK_EXECUTOR_MEMORY} --- End diff -- Seems like a good time to just use a wildcard here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239897958 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -367,11 +372,26 @@ private[spark] class SecurityManager( case _ => require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), - s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") return } -secretKey = Utils.createSecret(sparkConf) +if (sparkConf.get(AUTH_SECRET_FILE_DRIVER).isDefined + && sparkConf.get(AUTH_SECRET_FILE_EXECUTOR).isEmpty) { --- End diff -- Indent the second line one more level. Also you could do `sparkConf.contains(...DRIVER) == sparkConf.contains(...EXECUTOR)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239900460 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala --- @@ -158,6 +161,25 @@ class BasicExecutorFeatureStepSuite extends SparkFunSuite with BeforeAndAfter { checkEnv(executor, conf, Map(SecurityManager.ENV_AUTH_SECRET -> secMgr.getSecretKey())) } + test("Auth secret shouldn't propagate if files are loaded.") { +val secretDir = Utils.createTempDir("temp-secret") +val secretFile = new File(secretDir, "secret-file.txt") +Files.write(secretFile.toPath, "some-secret".getBytes(StandardCharsets.UTF_8)) +val conf = baseConf.clone() + .set(NETWORK_AUTH_ENABLED, true) + .set(AUTH_SECRET_FILE, secretFile.getAbsolutePath) + .set("spark.master", "k8s://127.0.0.1") +val secMgr = new SecurityManager(conf) +secMgr.initializeAuth() + +val step = new BasicExecutorFeatureStep(KubernetesTestConf.createExecutorConf(sparkConf = conf), + secMgr) + +val executor = step.configurePod(SparkPod.initialPod()) +assert(!executor.container.getEnv.asScala.map(_.getName).contains( --- End diff -- `!KubernetesFeaturesTestUtils.containerHasEnvVar(...)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239900826 --- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala --- @@ -18,8 +18,11 @@ package org.apache.spark import java.io.File +import java.nio.charset.StandardCharsets --- End diff -- Unnecessary (see next line). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239895212 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- > the more common use case, Which is? There's a lot to think about when you give permissions like "users can view, create and delete pods". If you do that, for example, you can delete other people's pods. That is also considered a security issue, since you can DoS other users. Anyway, my point is that we should give people the choice of how they deploy things, and set up security according to their own constraints. This was just one way of doing it, and was not meant to be the only way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239892984 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- > If the secret is put directly in the environment variable field itself, then anyone who has permission to get the pod metadata from the Kubernetes API server can now read the secret generated by this application. Yes, and it's extremely annoying that k8s allows anybody with access to the pods to read env variables, instead of just the pod owner. In fact, it doesn't even seem to have the concept of who owns the pod. Anyway, this isn't different from someone else being able to read secrets in the same namespace as the pod. As I said before, it all depends on how you configure your cluster for security, and in k8s there seems to be a lot of different options. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23221 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239609592 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- > Is it actually desirable to not fail on a partial frame? If you're reading a shuffle file compressed with zstd, and the shuffle file is corrupted somehow, this change may be allowing Spark to read incomplete shuffle data... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23195 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r239272037 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isDefined) s"context ${kubeContext.getOrElse("?")}" else "current context"}" + --- End diff -- I think using `kubeContext.map("context " + _).getOrElse("current context")` would make this cleaner. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23241 Please use the PR title and summary to describe the solution, not the problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22904 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22904 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r239177994 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,199 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)** +- **JAAS login configuration** + +### Delegation token + +This way the application can be configured via Spark parameters and may not need JAAS login +configuration (Spark can use Kafka's dynamic JAAS configuration feature). For further information +about delegation tokens, see [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + +The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers`, +Spark considers the following log in options, in order of preference: +- **JAAS login configuration** +- **Keytab file**, such as, + + ./bin/spark-submit \ + --keytab \ + --principal \ + --conf spark.kafka.bootstrap.servers= \ + ... + +- **Kerberos credential cache**, such as, + + ./bin/spark-submit \ + --conf spark.kafka.bootstrap.servers= \ + ... + +The Kafka delegation token provider can be turned off by setting `spark.security.credentials.kafka.enabled` to `false` (default: `true`). + +Spark can be configured to use the following authentication protocols to obtain token (it must match with +Kafka broker configuration): +- **SASL SSL (default)** +- **SSL** +- **SASL PLAINTEXT (for testing)** + +After obtaining delegation token successfully, Spark distributes it across nodes and renews it accordingly. +Delegation token uses `SCRAM` login module for authentication and because of that the appropriate +`sasl.mechanism` has to be configured on source/sink: + + + +{% highlight scala %} + +// Setting on Kafka Source for Streaming Queries --- End diff -- I think having just one example should be enough. Is `SCRAM-SHA-512` the only possible value? I think you mentioned different values before. If this needs to match the broker's configuration, that needs to be mentioned. Separately, it would be nice to think about having an external config for this so people don't need to hardcode this kind of thing in their code... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23221 I applied my own feedback to the original PR and will merge pending tests (since it was already reviewed), unless someone comments first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23221: [SPARK-24243][CORE] Expose exceptions from InProc...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/23221 [SPARK-24243][CORE] Expose exceptions from InProcessAppHandle ## What changes were proposed in this pull request? Adds a new method to SparkAppHandle called getError which returns the exception (if present) that caused the underlying Spark app to fail. ## How was this patch tested? New tests added to SparkLauncherSuite for the new method. Closes #21849 You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-24243 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23221.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23221 commit 9240b77078936dceaaa4a68f6a54c5c0c16aab73 Author: Sahil Takiar Date: 2018-07-23T17:31:24Z [SPARK-24243][CORE] Expose exceptions from InProcessAppHandle Adds a new method to `SparkAppHandle` called `getError` which returns the exception (if present) that caused the underlying Spark app to fail. New tests added to `SparkLauncherSuite` for the new method. commit 29f1436e14b453b41b055be6b4e124c5eae7d8ff Author: Marcelo Vanzin Date: 2018-12-05T00:37:58Z Merge branch 'master' into SPARK-24243 commit e58fc919355c48d2d3b1cacb4d0ee18036cacbc6 Author: Marcelo Vanzin Date: 2018-12-05T00:41:44Z Feedback. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23220: [SPARK-25877][k8s] Move all feature logic to feat...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/23220 [SPARK-25877][k8s] Move all feature logic to feature classes. This change makes the driver and executor builders a lot simpler by encapsulating almost all feature logic into the respective feature classes. The only logic that remains is the creation of the initial pod, which needs to happen before anything else so is better to be left in the builder class. Most feature classes already behave fine when the config has nothing they should handle, but a few minor tweaks had to be added. Unit tests were also updated or added to account for these. The builder suites were simplified a lot and just test the remaining pod-related code in the builders themselves. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-25877 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23220 commit a13bafd8e48d8a03fa35c6ff6817f03908f17e2d Author: Marcelo Vanzin Date: 2018-12-04T19:42:31Z [SPARK-25877][k8s] Move all feature logic to feature classes. This change makes the driver and executor builders a lot simpler by encapsulating almost all feature logic into the respective feature classes. The only logic that remains is the creation of the initial pod, which needs to happen before anything else so is better to be left in the builder class. Most feature classes already behave fine when the config has nothing they should handle, but a few minor tweaks had to be added. Unit tests were also updated or added to account for these. The builder suites were simplified a lot and just test the remaining pod-related code in the builders themselves. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23092: [SPARK-26094][CORE][STREAMING] createNonEcFile creates p...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23092 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238798656 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers` + set Spark looks for authentication information in the following order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Kafka delegation token provider can be turned off by setting `spark.security.credentials.kafka.enabled` to `false` (default: `true`). --- End diff -- "The Kafka delegation..." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238798169 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers` + set Spark looks for authentication information in the following order and choose the first available to log in: --- End diff -- "...when `blah` is set, Spark considers the following log in options, in order of preference:" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238799671 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers` + set Spark looks for authentication information in the following order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Kafka delegation token provider can be turned off by setting `spark.security.credentials.kafka.enabled` to `false` (default: `true`). + + Spark can be configured to use the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticates +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After obtaining delegation token successfully, Spark distributes it across nodes and renews it accordingly. + Delegation token uses `SCRAM` login module for authentication and because of that the appropriate + `sasl.mechanism` has to be configured on source/sink. --- End diff -- Still not clear to me. Does this mean the user has to change their code so that when they, e.g., run a Kafka query, they have to say `.option("sasl.mechanism", "SCRAM")`? This needs to tell the user exactly what to do to get this to work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23088 Merging to master / 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238780494 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -20,20 +20,22 @@ import java.io.File import com.google.common.base.Charsets import com.google.common.io.Files +import io.fabric8.kubernetes.client.Config.autoConfigure import io.fabric8.kubernetes.client.{ConfigBuilder, DefaultKubernetesClient, KubernetesClient} import io.fabric8.kubernetes.client.utils.HttpClientUtils import okhttp3.Dispatcher - +import org.apache.commons.lang3.StringUtils --- End diff -- Now unused. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238780744 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isDefined) s"context ${kubeContext.getOrElse("?")}" else "current context"}" + --- End diff -- I saw your commit about the NPE, but I'm not sure how you could get one here? `sparkConf.get` will never return `Some(null)` as far as I know (it would return `None` instead). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238776692 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: --- End diff -- Keeping the list of supported protocols without the explanation, saying it must match the Kafka broker config, sounds better to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238502012 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- (Just to clarify I'm referring to the Spark config. Too many config files involved here...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238484698 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- > we don't propagate the submission clients config file We don't propagate the config *file* itself, but we do propagate all its contents, as far as I remember. But given your explanation it should work fine, unless there's a different config context with the same name inside the container... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23037 I thought there was already one for that Hive suite failing... SPARK-23622? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 I looked at the test failure, but the logs weren't super useful. This passed locally, but let me retrigger here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > It matters because we're discussing direction I'm not, you guys are. I'm adding a missing feature with one particular implementation. If you want to add other implementations that enable different use cases, great. > we're effectively communicating that Spark is locked in to the authentication backed by K8s secrets We're not locking into anything, and that's basically where I strongly disagree with you. You're free to add new ways, and when that's done, you're not "locked in" anymore. Locked in would mean that pushing this PR means you cannot make changes to it later, and that's just not true. Right now you're "locked in" to no auth at all, but somehow that's ok? > check that work on SPARK-26239 would work nicely with it Anything needed to implement that feature is just code changes. Whether it "works nicely" is just a matter of not breaking this when that feature is implemented. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23037 ok, I give up on flaky tests. Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > with the caveat that we merge the subsequent optionality soon Again, and sorry for pounding on that key, but why does that matter? It has zero effect on the feature being added here. If the code added here is not good enough for your use case, you're in the exact same situation as if this change did not go in. But for those that can leverage the auth feature as added in this change, they're in a much, much better place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 I don't understand what you want. Without this change, auth does not work, period. With this, users at least have one choice. If you want to add another choice, you're free to. But I don't see why the lack of another choice has any effect on this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 As I suggested before, any alternative method can be added later. I don't see why does it need to block this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23191 Merging to 2.4. Please close the PR manually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23209: [SPARK-26256][K8s] Fix labels for pod deletion
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23209 Actually I forgot 2.4... there's also a conflict. Seems trivial, so I'll do it manually and fix the conflict (and run some local tests). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23209: [SPARK-26256][K8s] Fix labels for pod deletion
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23209 Merging to master / 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 So, can we move forward with this and let any future new feature be handled in SPARK-26239? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22911 on a non-testing not, any further feedback here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22911 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22911 there was a seemingly corrupt xml file in the jenkins worker, I removed it and will retest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23172: [SPARK-25957][followup] Build python docker image...
Github user vanzin closed the pull request at: https://github.com/apache/spark/pull/23172 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23172: [SPARK-25957][followup] Build python docker image in sbt...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23172 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238440694 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- What happens here when the context does not exist? Does it fall back to the default? e.g. in cluster mode, the config you're adding will be propagated to the driver, and then this code will be called with the same context as the submission node. What if that context does not exist inside the driver container? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238439850 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) --- End diff -- Either `.filter { c => ... }` or `.filter(StringUtils.isNotBlank)`. But really you can skip the extra dependency (`.filter(_.nonEmpty)`). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Skips Python resource limit on Win...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23055 (Belated +1.) Doc update looks fine. The previous one was misleading for reasons that Ryan explains above, it has nothing to do with whether it's Windows or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23037 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23088 Also it would be good to open a separate bug to address the fix for SHS / disk store. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r238406928 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -148,11 +148,20 @@ private[spark] class AppStatusStore( // cheaper for disk stores (avoids deserialization). val count = { Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(TaskIndexNames.EXEC_RUN_TIME) - .first(0L) - .closeableIterator() +if (store.isInstanceOf[LevelDB]) { --- End diff -- Could you invert the check so you don't need to import `LevelDB`? Just to avoid importing more implementation details of the kvstore module into this class... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r238407621 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala --- @@ -77,6 +77,34 @@ class AppStatusStoreSuite extends SparkFunSuite { assert(store.count(classOf[CachedQuantile]) === 2) } + test("only successfull task have taskSummary") { +val store = new InMemoryStore() +(0 until 5).foreach { i => store.write(newTaskData(i, "FAILED")) } --- End diff -- nit: `status = "FAILED"` when param has a default value --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238399808 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticates +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After obtaining delegation token successfully, Spark spreads it across nodes and renews it accordingly. + Delegation token uses `SCRAM` login module for authentication. --- End diff -- Do users need to do anything about this? If not, that's just an implementation detail and doesn't need to be in the docs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238399719 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticates +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After obtaining delegation token successfully, Spark spreads it across nodes and renews it accordingly. --- End diff -- s/spreads/distributes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238398453 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default --- End diff -- I think this needs to be reworded, since "enabled by default" is misleading. You're required to set `spark.kafka.bootstrap.servers` before this actually works. The `enabled` setting is only really useful to disable DTs when the other config has been set (e.g. if it's set in a shared config file and someone wants to disable it). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238399535 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: --- End diff -- This must match the Kafka broker's config, right? So it's not really "Spark supports", but that Spark's configuration must match the Kafka config, right? If that's the case, then explaining each option here is not really that helpful, since the Kafka admin is the one who should care. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23037 Actually I lied. Could you update the `create_dev_build_context` function in `docker-image-tool.sh` to copy this new directory? You can run the script from your build directory to test it. You should also revert the change to the python it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23037 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23181 This didn't merge cleanly to 2.4, please open a PR against that branch if you want it there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23181 Merging to master / 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 I filed SPARK-26239. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > A proposed scheme is to have spark.authenticate.k8s.secret.provider=autok8ssecret If you're going to add a different way to get the auth secret later, then you can introduce that option with a default value. It does not mean it needs to be done *in this change*, which is my point. The only real change being introduced here form the Spark status quo is that you don't need to provide your own auth secret in the configuration (i.e. it would work like YARN), which doesn't even work in k8s today because that is not propagated to the executors in any way. And if you think that is a problem I can gate the generation of the secret based on whether one is already defined in the config. > if it's not introduced in this patch, then at least we should file JIRA tickets That is fine with me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23189: [SPARK-26235][Core] Change log level for ClassNotFoundEx...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23189 Sounds ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > The way it's written now Code can change after it's written... > If this change is merged into 3.x without any other changes, users will be forced to use the K8s secret based If this change is not merged, users have no way to use authentication at all. So I don't see your point. It will prevent you from using the Vault approach in the sense that support for that won't be implemented yet. But this change, again, does not put a block on adding support for what you're saying later. If you think that's important and want to do that in 3.x, then you're free to. But I don't see why this approach needs to be blocked because it doesn't cover the Vault use case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23189: [SPARK-26235][Core] Change log level for ClassNotFoundEx...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23189 > The logWarning call as the other handler below is also not overridden: It is. I even copied & pasted the code. I made the change locally and this is what happens: ``` $ ./bin/spark-submit --class foo bar.jar Warning: Failed to load foo: foo ``` With your change, if you are using a file appender instead of a console appender, you'll still not see the error on the terminal. For `spark-submit`, since it's a user-run script, it's good to print something to the terminal in these cases, regardless of the logging configuration. If you want to also write the full exception to the logs, that's fine, but the change I suggested is more correct here considering the use. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > while leaving it an exercise for the reader to understand how to properly run spark such that the secrets are actually secured. I don't think that's an exercise for the user, but for the admin. If the admin has configured things properly, the user will be able to deploy secure applications without issue. This is not just the case in kubernetes. It's the case in any deployment. Even your example of using Vault requires that. There's work needed to use it properly and securely. > avoid coupling the API for how spark containers read secrets There doesn't need to be a single solution. This patch going in does not preclude adding more features later, one of which might be reading this from a pre-defined secret. And in your example I don't see how Vault would actually solve the client-mode driver case. There's no kubernetes involved in the driver, which may be running anywhere, and it needs to know the secret. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23181#discussion_r237952522 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala --- @@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter { } test("SPARK-25451: total tasks in the executor summary should match total stage tasks") { -val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue) -val listener = new AppStatusListener(store, testConf, true) +val isLiveSeq = Seq(true, false) -val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details") -listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null)) -listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new Properties())) +isLiveSeq.foreach { live: Boolean => + val testConf = if (live) { +conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue) --- End diff -- nit: `clone()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23181#discussion_r237952479 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala --- @@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter { } test("SPARK-25451: total tasks in the executor summary should match total stage tasks") { -val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue) -val listener = new AppStatusListener(store, testConf, true) +val isLiveSeq = Seq(true, false) -val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details") -listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null)) -listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new Properties())) +isLiveSeq.foreach { live: Boolean => --- End diff -- When doing things like this I prefer to invert the logic. ``` Seq(true, false).foreach { live => test(s"blah blah blah (live = $live)") { } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23189: [SPARK-26235][Core] Change log level for ClassNotFoundEx...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23189 Actually the problem is here: ``` case e: ClassNotFoundException => logWarning(s"Failed to load $childMainClass.", e) ``` That particular `logWarning` is not overridden in the SparkSubmit object: ``` override def main(args: Array[String]): Unit = { val submit = new SparkSubmit() { ... override protected def logWarning(msg: => String): Unit = printMessage(s"Warning: $msg") ``` Probably should instead use the same `logWarning` call as the other handler below: ``` case e: NoClassDefFoundError => logWarning(s"Failed to load $childMainClass: ${e.getMessage()}") ``` Which should then always print the error regardless of your log4j configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Skips Python resource limit...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237941472 --- Diff: docs/configuration.md --- @@ -190,6 +190,8 @@ of the most common options to set are: and it is up to the application to avoid exceeding the overhead memory space shared with other non-JVM processes. When PySpark is run in YARN or Kubernetes, this memory is added to executor resource requests. + +NOTE: This configuration is not supported on Windows. --- End diff -- This is a little misleading since on Windows the extra memory will still be added to the resource requests. It's just the process-level memory limit that is not implemented. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22598 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237723350 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- I think it's way better to have the JVM not care and have the python side handle the setting according to its capabilities. It's a smaller change, it's more localized, and it reduces the amount of changes if this needs to be tweaked again in the future. I don't see the consistency argument you're making. You picked one example that is very different from this one, since it actually affects the behavior of the JVM code. Could it have been developed differently? Of course. But we're not discussing that feature here. And there are so few of these cases that you really can't draw a pattern from them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237716326 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- I really don't understand what you said above. If the check is done on the Python side, what is the exact reason why you need it on the JVM side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237714454 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- What is the behavior you want? What do you mean "python can't fail"? If the memory limit is set by the user, you have two options when the python side cannot do anything with it: - ignore it (the current patch) - raise an error Both can be done from the python side and do not require any checks in the JVM. Checking in the JVM is the wrong thing. You're hardcoding the logic that this feature can never work on Windows. And if someone finds out that it can, then you'll have to change it, and that check was basically useless, since an equivalent check exists on the python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237715129 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- > Let me just follow majority A -1 is a -1. It's not majority based (this is not a release). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237715021 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- > For instance, forking in Python is disallowed in Python on Windows That is *completely* different. Because that affects how the Java side talks to the Python side. The feature in this PR has no bearing on what happens on the JVM side at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r237659299 --- Diff: core/src/main/scala/org/apache/spark/internal/config/Kafka.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.internal.config + +private[spark] object Kafka { + + private[spark] val BOOTSTRAP_SERVERS = --- End diff -- modifiers are now redundant since this is an object (and not a package object). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23037 When I tried to write automated tests for pyspark in the past it was kind of a pain. It doesn't work the way you expect unless you have a pseudo-terminal, apparently. Maybe try to write a test script around it using `pty.spawn` or `pty.fork`. But to be frank, that sounds more complicated than it's worth... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r237620523 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/KafkaTokenUtil.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.security + +import java.{ util => ju } +import java.text.SimpleDateFormat + +import scala.util.control.NonFatal + +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.token.{Token, TokenIdentifier} +import org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier +import org.apache.kafka.clients.CommonClientConfigs +import org.apache.kafka.clients.admin.{AdminClient, CreateDelegationTokenOptions} +import org.apache.kafka.common.config.SaslConfigs +import org.apache.kafka.common.security.JaasContext +import org.apache.kafka.common.security.auth.SecurityProtocol.{SASL_PLAINTEXT, SASL_SSL, SSL} +import org.apache.kafka.common.security.token.delegation.DelegationToken + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ + +private[spark] object KafkaTokenUtil extends Logging { + val TOKEN_KIND = new Text("KAFKA_DELEGATION_TOKEN") + val TOKEN_SERVICE = new Text("kafka.server.delegation.token") + + private[spark] class KafkaDelegationTokenIdentifier extends AbstractDelegationTokenIdentifier { +override def getKind: Text = TOKEN_KIND + } + + private[security] def obtainToken(sparkConf: SparkConf): (Token[_ <: TokenIdentifier], Long) = { +val adminClient = AdminClient.create(createAdminClientProperties(sparkConf)) +val createDelegationTokenOptions = new CreateDelegationTokenOptions() +val createResult = adminClient.createDelegationToken(createDelegationTokenOptions) +val token = createResult.delegationToken().get() +printToken(token) + +(new Token[KafkaDelegationTokenIdentifier]( + token.tokenInfo.tokenId.getBytes, + token.hmacAsBase64String.getBytes, + TOKEN_KIND, + TOKEN_SERVICE +), token.tokenInfo.expiryTimestamp) + } + + private[security] def createAdminClientProperties(sparkConf: SparkConf): ju.Properties = { +val adminClientProperties = new ju.Properties + +val bootstrapServers = sparkConf.get(KAFKA_BOOTSTRAP_SERVERS) +require(bootstrapServers.nonEmpty, s"Tried to obtain kafka delegation token but bootstrap " + + "servers not configured.") + adminClientProperties.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers.get) + +val protocol = sparkConf.get(KAFKA_SECURITY_PROTOCOL) + adminClientProperties.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, protocol) +protocol match { + case SASL_SSL.name => +setTrustStoreProperties(sparkConf, adminClientProperties) + + case SSL.name => +setTrustStoreProperties(sparkConf, adminClientProperties) +setKeyStoreProperties(sparkConf, adminClientProperties) +logWarning("Obtaining kafka delegation token with SSL protocol. Please " + + "configure 2-way authentication on the broker side.") + + case SASL_PLAINTEXT.name => +logWarning("Obtaining kafka delegation token through plain communication channel. Please " + + "consider the security impact.") +} + +// There are multiple possibilities to log in and applied in the following order: --- End diff -- Mention the Kafka bug here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r237620333 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -688,4 +688,65 @@ package object config { .stringConf .toSequence .createWithDefault(Nil) + + private[spark] val KAFKA_BOOTSTRAP_SERVERS = --- End diff -- This pattern was added after you PR, but since it's there... might be better to put these into a new file (e.g. `Kafka.scala`) in this package, instead of adding more stuff to this file. (e.g. see `History.scala`) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r237622443 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelperSuite.scala --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.kafka010 + +import java.util.UUID + +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.hadoop.security.token.Token +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.deploy.security.KafkaTokenUtil +import org.apache.spark.deploy.security.KafkaTokenUtil.KafkaDelegationTokenIdentifier +import org.apache.spark.internal.config.{KAFKA_KERBEROS_SERVICE_NAME} + +class KafkaSecurityHelperSuite extends SparkFunSuite with BeforeAndAfterEach { + private val keytab = "/path/to/keytab" + private val kerberosServiceName = "kafka" + private val principal = "u...@domain.com" + private val tokenId = "tokenId" + UUID.randomUUID().toString + private val tokenPassword = "tokenPassword" + UUID.randomUUID().toString + + private var sparkConf: SparkConf = null + + override def beforeEach(): Unit = { +super.beforeEach() +sparkConf = new SparkConf() + } + + override def afterEach(): Unit = { +try { + resetUGI +} finally { + super.afterEach() +} + } + + private def addTokenToUGI(): Unit = { +val token = new Token[KafkaDelegationTokenIdentifier]( + tokenId.getBytes, + tokenPassword.getBytes, + KafkaTokenUtil.TOKEN_KIND, + KafkaTokenUtil.TOKEN_SERVICE +) +val creds = new Credentials() +creds.addToken(KafkaTokenUtil.TOKEN_SERVICE, token) +UserGroupInformation.getCurrentUser.addCredentials(creds) + } + + private def resetUGI: Unit = { --- End diff -- add `()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23151#discussion_r237617706 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -494,13 +494,12 @@ class SparkSubmitSuite } test("launch simple application with spark-submit with redaction") { -val testDir = Utils.createTempDir() -testDir.deleteOnExit() -val testDirPath = new Path(testDir.getAbsolutePath()) val unusedJar = TestUtils.createJarWithClasses(Seq.empty) val fileSystem = Utils.getHadoopFileSystem("/", SparkHadoopUtil.get.newConfiguration(new SparkConf())) -try { +withTempDir { testDir => + testDir.deleteOnExit() --- End diff -- `deleteOnExit` is redundant for directories created by `Utils.createTempDir`. That method already tracks directories to cleanup on exit. (Also `deleteOnExit` does not work for non-empty directories.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 (In fact, env variables don't even show up in the UI or event logs, as far as I can see. Other configs - Spark config, system properties, e.g. - do show up, and are redacted to mask secrets.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23017: [SPARK-26015][K8S] Set a default UID for Spark on K8S Im...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23017 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23158 Merging to master / 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > if the secret would be listed under the environment variables in the Spark UI Secrets are redacted in the UI and event logs. We already use env variables in other contexts (e.g. standalone with auth enabled). Environment variables don't leak unless you leak them. If you do, it's a security problem in your code, since the env is generally considered "sensitive information". They're not written to disk, unlike files, which some people have problems with (really paranoid orgs don't want sensitive information in unencrypted files on disk). This could be stashed in a k8s secret, but then how does the client mode driver get it? More user configuration? That's exactly what this is trying to avoid. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22959: [SPARK-25876][k8s] Simplify kubernetes configurat...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22959#discussion_r237587099 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -112,125 +72,139 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( def getOption(key: String): Option[String] = sparkConf.getOption(key) } +private[spark] class KubernetesDriverConf( +sparkConf: SparkConf, +val appId: String, +val mainAppResource: MainAppResource, +val mainClass: String, +val appArgs: Array[String], +val pyFiles: Seq[String]) + extends KubernetesConf(sparkConf) { + + override val resourceNamePrefix: String = { +val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None --- End diff -- I'm trying to avoid creating custom test classes here (that's what I understand by "inject", since there's no way to "inject" this otherwise). There's really a single test that needs this functionality, IIRC, and this pattern is way more common in Spark than what you're suggesting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23174 > via a mounted file > Also the user should be able to specify their own mounted file The point is that the user shouldn't need to set this at all. You enable auth, Spark takes care of it. There's no point in specifying a pre-defined file with the secret - in fact that makes things *less* secure, because you'd be reusing the secret in different apps. > Some users would prefer to avoid propagating sensitive information via environment variables Why? And how are mounted files better? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23158#discussion_r237293492 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc assert(!log2.exists()) } + test("should not clean inprogress application with lastUpdated time less the maxTime") { +val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) +val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) +val maxAge = TimeUnit.DAYS.toMillis(7) +val clock = new ManualClock(0) +val provider = new FsHistoryProvider( + createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) +val log = newLogFile("inProgressApp1", None, inProgress = true) +writeFile(log, true, None, + SparkListenerApplicationStart( +"inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")) +) +clock.setTime(firstFileModifiedTime) +provider.checkForLogs() --- End diff -- Perhaps. Better to be consistent with other tests. Also because you're using a manual clock, and otherwise your mod times will be way higher than the clock's time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23058: [SPARK-25905][CORE] When getting a remote block, ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23058#discussion_r237284719 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -718,13 +718,9 @@ private[spark] class BlockManager( } /** - * Get block from remote block managers as serialized bytes. + * Get block from remote block managers as a ManagedBuffer. */ - def getRemoteBytes(blockId: BlockId): Option[ChunkedByteBuffer] = { -// TODO SPARK-25905 if we change this method to return the ManagedBuffer, then getRemoteValues -// could just use the inputStream on the temp file, rather than reading the file into memory. -// Until then, replication can cause the process to use too much memory and get killed -// even though we've read the data to disk. + def getRemoteManagedBuffer(blockId: BlockId): Option[ManagedBuffer] = { --- End diff -- Can this be private? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23158#discussion_r237280881 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc assert(!log2.exists()) } + test("should not clean inprogress application with lastUpdated time less the maxTime") { +val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) +val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) +val maxAge = TimeUnit.DAYS.toMillis(7) +val clock = new ManualClock(0) +val provider = new FsHistoryProvider( + createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) +val log = newLogFile("inProgressApp1", None, inProgress = true) +writeFile(log, true, None, + SparkListenerApplicationStart( +"inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")) +) +clock.setTime(firstFileModifiedTime) +provider.checkForLogs() --- End diff -- You need to set the log file's modified time before calling this, otherwise the cleaner won't be checking what you expect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23158#discussion_r237280935 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc assert(!log2.exists()) } + test("should not clean inprogress application with lastUpdated time less the maxTime") { +val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) +val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) +val maxAge = TimeUnit.DAYS.toMillis(7) +val clock = new ManualClock(0) +val provider = new FsHistoryProvider( + createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) --- End diff -- Use config constant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23172: [SPARK-25957][followup] Build python docker image...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23172#discussion_r237276972 --- Diff: project/SparkBuild.scala --- @@ -494,7 +494,12 @@ object KubernetesIntegrationTests { dockerBuild := { if (shouldBuildImage) { val dockerTool = s"$sparkHome/bin/docker-image-tool.sh" -val cmd = Seq(dockerTool, "-m", "-t", imageTag.value, "build") +val bindingsDir = s"$sparkHome/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings" +val cmd = Seq(dockerTool, "-m", + "-t", imageTag.value, + "-p", s"$bindingsDir/python/Dockerfile", --- End diff -- Not yet (ITs for R are not run). But shouldn't hurt either. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/23174 [SPARK-26194][k8s] Auto generate auth secret for k8s apps. This change modifies the logic in the SecurityManager to do two things: - generate unique app secrets also when k8s is being used - only store the secret in the user's UGI on YARN The latter is needed so that k8s won't unnecessarily create k8s secrets for the UGI credentials when only the auth token is stored there. On the k8s side, the secret is propagated to executors using an environment variable instead. This ensures it works in both client and cluster mode. Security doc was updated to mention the feature and clarify that proper access control in k8s should be enabled for it to be secure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-26194 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23174 commit 0e36a4bb4a5a1ad9abee7e003b7d5f3588cba126 Author: Marcelo Vanzin Date: 2018-11-16T23:21:00Z [SPARK-26194][k8s] Auto generate auth secret for k8s apps. This change modifies the logic in the SecurityManager to do two things: - generate unique app secrets also when k8s is being used - only store the secret in the user's UGI on YARN The latter is needed so that k8s won't unnecessarily create k8s secrets for the UGI credentials when only the auth token is stored there. On the k8s side, the secret is propagated to executors using an environment variable instead. This ensures it works in both client and cluster mode. Security doc was updated to mention the feature and clarify that proper access control in k8s should be enabled for it to be secure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org