[GitHub] spark issue #23254: [SPARK-26304][SS] Add default value to spark.kafka.sasl....

2018-12-07 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23254
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-07 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23252
  
(Also, another nit, Spark authentication is not necessarily SASL-based.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-07 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23221
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r239903144
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

> I think this enhancement does not apply to client mode

If you mean "client mode inside a k8s-managed docker container", then yes, 
you may need to do extra stuff, like mount the appropriate credentials. But in 
the "client mode with driver inside k8s pod" case, Spark does not create that 
pod for you. So I'm not sure how Spark can help with anything there; the 
`serviceName` configuration seems targeted at propagating the credentials of 
the submitter to the driver pod, and in that case Spark is not creating the 
driver pod at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239898432
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -380,6 +400,12 @@ private[spark] class SecurityManager(
 }
   }
 
+  private def readSecretFile(secretFilePath: String): String = {
+val secretFile = new File(secretFilePath)
+require(secretFile.isFile, s"No file found containing the secret key 
at $secretFilePath.")
+Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath))
--- End diff --

It would be good to validate that this is at least not empty; optionally if 
it at least contain the number of expected bytes (there's a config for that).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239899856
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -112,12 +112,14 @@ private[spark] class BasicExecutorFeatureStep(
 .build())
   .build())
   } ++ {
-Option(secMgr.getSecretKey()).map { authSecret =>
-  new EnvVarBuilder()
-.withName(SecurityManager.ENV_AUTH_SECRET)
-.withValue(authSecret)
-.build()
-}
+Option(secMgr.getSecretKey())
--- End diff --

```
if (!kubernetesConf.contains(...)) {
   // existing code
} else {
  None
}
```

I really don't like trying to mix different concerns in the same call 
chain...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r23991
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
 intercept[IllegalArgumentException] {
   mgr.getSecretKey()
 }
+  case FILE =>
--- End diff --

add empty line before


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239900712
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -24,7 +24,7 @@ import org.apache.spark.{SecurityManager, SparkConf, 
SparkException}
 import org.apache.spark.deploy.k8s._
 import org.apache.spark.deploy.k8s.Config._
 import org.apache.spark.deploy.k8s.Constants._
-import org.apache.spark.internal.config.{EXECUTOR_CLASS_PATH, 
EXECUTOR_JAVA_OPTIONS, EXECUTOR_MEMORY, EXECUTOR_MEMORY_OVERHEAD, 
PYSPARK_EXECUTOR_MEMORY}
+import org.apache.spark.internal.config.{AUTH_SECRET_FILE_EXECUTOR, 
EXECUTOR_CLASS_PATH, EXECUTOR_JAVA_OPTIONS, EXECUTOR_MEMORY, 
EXECUTOR_MEMORY_OVERHEAD, PYSPARK_EXECUTOR_MEMORY}
--- End diff --

Seems like a good time to just use a wildcard here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239897958
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -367,11 +372,26 @@ private[spark] class SecurityManager(
 
   case _ =>
 require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
-  s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config")
 return
 }
 
-secretKey = Utils.createSecret(sparkConf)
+if (sparkConf.get(AUTH_SECRET_FILE_DRIVER).isDefined
+  && sparkConf.get(AUTH_SECRET_FILE_EXECUTOR).isEmpty) {
--- End diff --

Indent the second line one more level.

Also you could do `sparkConf.contains(...DRIVER) == 
sparkConf.contains(...EXECUTOR)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239900460
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
 ---
@@ -158,6 +161,25 @@ class BasicExecutorFeatureStepSuite extends 
SparkFunSuite with BeforeAndAfter {
 checkEnv(executor, conf, Map(SecurityManager.ENV_AUTH_SECRET -> 
secMgr.getSecretKey()))
   }
 
+  test("Auth secret shouldn't propagate if files are loaded.") {
+val secretDir = Utils.createTempDir("temp-secret")
+val secretFile = new File(secretDir, "secret-file.txt")
+Files.write(secretFile.toPath, 
"some-secret".getBytes(StandardCharsets.UTF_8))
+val conf = baseConf.clone()
+  .set(NETWORK_AUTH_ENABLED, true)
+  .set(AUTH_SECRET_FILE, secretFile.getAbsolutePath)
+  .set("spark.master", "k8s://127.0.0.1")
+val secMgr = new SecurityManager(conf)
+secMgr.initializeAuth()
+
+val step = new 
BasicExecutorFeatureStep(KubernetesTestConf.createExecutorConf(sparkConf = 
conf),
+  secMgr)
+
+val executor = step.configurePod(SparkPod.initialPod())
+assert(!executor.container.getEnv.asScala.map(_.getName).contains(
--- End diff --

`!KubernetesFeaturesTestUtils.containerHasEnvVar(...)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239900826
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -18,8 +18,11 @@
 package org.apache.spark
 
 import java.io.File
+import java.nio.charset.StandardCharsets
--- End diff --

Unnecessary (see next line).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239895212
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

>  the more common use case,

Which is?

There's a lot to think about when you give permissions like "users can 
view, create and delete pods". If you do that, for example, you can delete 
other people's pods. That is also considered a security issue, since you can 
DoS other users.

Anyway, my point is that we should give people the choice of how they 
deploy things, and set up security according to their own constraints. This was 
just one way of doing it, and was not meant to be the only way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239892984
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

> If the secret is put directly in the environment variable field itself, 
then anyone who has permission to get the pod metadata from the Kubernetes API 
server can now read the secret generated by this application.

Yes, and it's extremely annoying that k8s allows anybody with access to the 
pods to read env variables, instead of just the pod owner. In fact, it doesn't 
even seem to have the concept of who owns the pod.

Anyway, this isn't different from someone else being able to read secrets 
in the same namespace as the pod.

As I said before, it all depends on how you configure your cluster for 
security, and in k8s there seems to be a lot of different options.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-06 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23221
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239609592
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

> Is it actually desirable to not fail on a partial frame?

If you're reading a shuffle file compressed with zstd, and the shuffle file 
is corrupted somehow, this change may be allowing Spark to read incomplete 
shuffle data...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...

2018-12-06 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23195
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r239272037
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty)
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isDefined) s"context 
${kubeContext.getOrElse("?")}" else "current context"}" +
--- End diff --

I think using `kubeContext.map("context " + _).getOrElse("current 
context")` would make this cleaner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23241
  
Please use the PR title and summary to describe the solution, not the 
problem.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-12-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22904
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-12-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22904
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r239177994
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,199 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**
+- **JAAS login configuration**
+
+### Delegation token
+
+This way the application can be configured via Spark parameters and may 
not need JAAS login
+configuration (Spark can use Kafka's dynamic JAAS configuration feature). 
For further information
+about delegation tokens, see [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+The process is initiated by Spark's Kafka delegation token provider. When 
`spark.kafka.bootstrap.servers`,
+Spark considers the following log in options, in order of preference:
+- **JAAS login configuration**
+- **Keytab file**, such as,
+
+  ./bin/spark-submit \
+  --keytab  \
+  --principal  \
+  --conf spark.kafka.bootstrap.servers= \
+  ...
+
+- **Kerberos credential cache**, such as,
+
+  ./bin/spark-submit \
+  --conf spark.kafka.bootstrap.servers= \
+  ...
+
+The Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
+
+Spark can be configured to use the following authentication protocols to 
obtain token (it must match with
+Kafka broker configuration):
+- **SASL SSL (default)**
+- **SSL**
+- **SASL PLAINTEXT (for testing)**
+
+After obtaining delegation token successfully, Spark distributes it across 
nodes and renews it accordingly.
+Delegation token uses `SCRAM` login module for authentication and because 
of that the appropriate
+`sasl.mechanism` has to be configured on source/sink:
+
+
+
+{% highlight scala %}
+
+// Setting on Kafka Source for Streaming Queries
--- End diff --

I think having just one example should be enough.

Is `SCRAM-SHA-512` the only possible value? I think you mentioned different 
values before. If this needs to match the broker's configuration, that needs to 
be mentioned.

Separately, it would be nice to think about having an external config for 
this so people don't need to hardcode this kind of thing in their code...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-04 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23221
  
I applied my own feedback to the original PR and will merge pending tests 
(since it was already reviewed), unless someone comments first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23221: [SPARK-24243][CORE] Expose exceptions from InProc...

2018-12-04 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/23221

[SPARK-24243][CORE] Expose exceptions from InProcessAppHandle

## What changes were proposed in this pull request?

Adds a new method to SparkAppHandle called getError which returns
the exception (if present) that caused the underlying Spark app to
fail.

## How was this patch tested?

New tests added to SparkLauncherSuite for the new method.

Closes #21849

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-24243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23221


commit 9240b77078936dceaaa4a68f6a54c5c0c16aab73
Author: Sahil Takiar 
Date:   2018-07-23T17:31:24Z

[SPARK-24243][CORE] Expose exceptions from InProcessAppHandle

Adds a new method to `SparkAppHandle` called `getError` which returns
the exception (if present) that caused the underlying Spark app to
fail.

New tests added to `SparkLauncherSuite` for the new method.

commit 29f1436e14b453b41b055be6b4e124c5eae7d8ff
Author: Marcelo Vanzin 
Date:   2018-12-05T00:37:58Z

Merge branch 'master' into SPARK-24243

commit e58fc919355c48d2d3b1cacb4d0ee18036cacbc6
Author: Marcelo Vanzin 
Date:   2018-12-05T00:41:44Z

Feedback.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23220: [SPARK-25877][k8s] Move all feature logic to feat...

2018-12-04 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/23220

[SPARK-25877][k8s] Move all feature logic to feature classes.

This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-25877

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23220.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23220


commit a13bafd8e48d8a03fa35c6ff6817f03908f17e2d
Author: Marcelo Vanzin 
Date:   2018-12-04T19:42:31Z

[SPARK-25877][k8s] Move all feature logic to feature classes.

This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23092: [SPARK-26094][CORE][STREAMING] createNonEcFile creates p...

2018-12-04 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23092
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238798656
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
When `spark.kafka.bootstrap.servers`
+  set Spark looks for authentication information in the following order 
and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
--- End diff --

"The Kafka delegation..."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238798169
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
When `spark.kafka.bootstrap.servers`
+  set Spark looks for authentication information in the following order 
and choose the first available to log in:
--- End diff --

"...when `blah` is set, Spark considers the following log in options, in 
order of preference:"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238799671
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
When `spark.kafka.bootstrap.servers`
+  set Spark looks for authentication information in the following order 
and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
+
+  Spark can be configured to use the following authentication protocols to 
obtain token:
+  - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for 
authentication and SSL for encryption.
+  - **SSL**: It's leveraging a capability from SSL called 2-way 
authentication. The server authenticates
+clients through certificates. Please note 2-way authentication must be 
enabled on Kafka brokers.
+  - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos 
used for authentication but
+because there is no encryption it's only for testing purposes.
+
+  After obtaining delegation token successfully, Spark distributes it 
across nodes and renews it accordingly.
+  Delegation token uses `SCRAM` login module for authentication and 
because of that the appropriate
+  `sasl.mechanism` has to be configured on source/sink.
--- End diff --

Still not clear to me. Does this mean the user has to change their code so 
that when they, e.g., run a Kafka query, they have to say 
`.option("sasl.mechanism", "SCRAM")`?

This needs to tell the user exactly what to do to get this to work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-04 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merging to master / 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238780494
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -20,20 +20,22 @@ import java.io.File
 
 import com.google.common.base.Charsets
 import com.google.common.io.Files
+import io.fabric8.kubernetes.client.Config.autoConfigure
 import io.fabric8.kubernetes.client.{ConfigBuilder, 
DefaultKubernetesClient, KubernetesClient}
 import io.fabric8.kubernetes.client.utils.HttpClientUtils
 import okhttp3.Dispatcher
-
+import org.apache.commons.lang3.StringUtils
--- End diff --

Now unused.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238780744
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty)
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isDefined) s"context 
${kubeContext.getOrElse("?")}" else "current context"}" +
--- End diff --

I saw your commit about the NPE, but I'm not sure how you could get one 
here?

`sparkConf.get` will never return `Some(null)` as far as I know (it would 
return `None` instead).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238776692
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
+  but can be turned off with `spark.security.credentials.kafka.enabled`. 
When
+  `spark.kafka.bootstrap.servers` set Spark looks for authentication 
information in the following
+  order and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Spark supports the following authentication protocols to obtain token:
--- End diff --

Keeping the list of supported protocols without the explanation, saying it 
must match the Kafka broker config, sounds better to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238502012
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

(Just to clarify I'm referring to the Spark config. Too many config files 
involved here...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238484698
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

> we don't propagate the submission clients config file

We don't propagate the config *file* itself, but we do propagate all its 
contents, as far as I remember. But given your explanation it should work fine, 
unless there's a different config context with the same name inside the 
container...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23037
  
I thought there was already one for that Hive suite failing... SPARK-23622?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
I looked at the test failure, but the logs weren't super useful. This 
passed locally, but let me retrigger here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
> It matters because we're discussing direction

I'm not, you guys are. I'm adding a missing feature with one particular 
implementation. If you want to add other implementations that enable different 
use cases, great.

> we're effectively communicating that Spark is locked in to the 
authentication backed by K8s secrets

We're not locking into anything, and that's basically where I strongly 
disagree with you. You're free to add new ways, and when that's done, you're 
not "locked in" anymore.

Locked in would mean that pushing this PR means you cannot make changes to 
it later, and that's just not true.

Right now you're "locked in" to no auth at all, but somehow that's ok?

> check that work on SPARK-26239 would work nicely with it

Anything needed to implement that feature is just code changes. Whether it 
"works nicely" is just a matter of not breaking this when that feature is 
implemented.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23037
  
ok, I give up on flaky tests.

Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
> with the caveat that we merge the subsequent optionality soon

Again, and sorry for pounding on that key, but why does that matter? It has 
zero effect on the feature being added here. If the code added here is not good 
enough for your use case, you're in the exact same situation as if this change 
did not go in. But for those that can leverage the auth feature as added in 
this change, they're in a much, much better place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
I don't understand what you want.

Without this change, auth does not work, period.

With this, users at least have one choice.

If you want to add another choice, you're free to. But I don't see why the 
lack of another choice has any effect on this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
As I suggested before, any alternative method can be added later. I don't 
see why does it need to block this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23191
  
Merging to 2.4. Please close the PR manually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23209: [SPARK-26256][K8s] Fix labels for pod deletion

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23209
  
Actually I forgot 2.4... there's also a conflict. Seems trivial, so I'll do 
it manually and fix the conflict (and run some local tests).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23209: [SPARK-26256][K8s] Fix labels for pod deletion

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23209
  
Merging to master / 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
So, can we move forward with this and let any future new feature be handled 
in SPARK-26239?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22911
  
on a non-testing not, any further feedback here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22911
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22911
  
there was a seemingly corrupt xml file in the jenkins worker, I removed it 
and will retest.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23172: [SPARK-25957][followup] Build python docker image...

2018-12-03 Thread vanzin
Github user vanzin closed the pull request at:

https://github.com/apache/spark/pull/23172


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23172: [SPARK-25957][followup] Build python docker image in sbt...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23172
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238440694
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

What happens here when the context does not exist? Does it fall back to the 
default?

e.g. in cluster mode, the config you're adding will be propagated to the 
driver, and then this code will be called with the same context as the 
submission node. What if that context does not exist inside the driver 
container?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238439850
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
--- End diff --

Either `.filter { c => ... }` or `.filter(StringUtils.isNotBlank)`. But 
really you can skip the extra dependency (`.filter(_.nonEmpty)`).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Skips Python resource limit on Win...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23055
  
(Belated +1.) Doc update looks fine. The previous one was misleading for 
reasons that Ryan explains above, it has nothing to do with whether it's 
Windows or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23037
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23088
  
Also it would be good to open a separate bug to address the fix for SHS / 
disk store.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r238406928
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -148,11 +148,20 @@ private[spark] class AppStatusStore(
 // cheaper for disk stores (avoids deserialization).
 val count = {
   Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(TaskIndexNames.EXEC_RUN_TIME)
-  .first(0L)
-  .closeableIterator()
+if (store.isInstanceOf[LevelDB]) {
--- End diff --

Could you invert the check so you don't need to import `LevelDB`? Just to 
avoid importing more implementation details of the kvstore module into this 
class...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r238407621
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -77,6 +77,34 @@ class AppStatusStoreSuite extends SparkFunSuite {
 assert(store.count(classOf[CachedQuantile]) === 2)
   }
 
+  test("only successfull task have taskSummary") {
+val store = new InMemoryStore()
+(0 until 5).foreach { i => store.write(newTaskData(i, "FAILED")) }
--- End diff --

nit: `status = "FAILED"` when param has a default value


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238398453
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
--- End diff --

I think this needs to be reworded, since "enabled by default" is 
misleading. You're required to set `spark.kafka.bootstrap.servers`  before this 
actually works. The `enabled` setting is only really useful to disable DTs when 
the other config has been set (e.g. if it's set in a shared config file and 
someone wants to disable it).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238399535
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
+  but can be turned off with `spark.security.credentials.kafka.enabled`. 
When
+  `spark.kafka.bootstrap.servers` set Spark looks for authentication 
information in the following
+  order and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Spark supports the following authentication protocols to obtain token:
--- End diff --

This must match the Kafka broker's config, right? So it's not really "Spark 
supports", but that Spark's configuration must match the Kafka config, right? 
If that's the case, then explaining each option here is not really that 
helpful, since the Kafka admin is the one who should care.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238399808
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
+  but can be turned off with `spark.security.credentials.kafka.enabled`. 
When
+  `spark.kafka.bootstrap.servers` set Spark looks for authentication 
information in the following
+  order and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Spark supports the following authentication protocols to obtain token:
+  - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for 
authentication and SSL for encryption.
+  - **SSL**: It's leveraging a capability from SSL called 2-way 
authentication. The server authenticates
+clients through certificates. Please note 2-way authentication must be 
enabled on Kafka brokers.
+  - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos 
used for authentication but
+because there is no encryption it's only for testing purposes.
+
+  After obtaining delegation token successfully, Spark spreads it across 
nodes and renews it accordingly.
+  Delegation token uses `SCRAM` login module for authentication.
--- End diff --

Do users need to do anything about this? If not, that's just an 
implementation detail and doesn't need to be in the docs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238399719
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
+  but can be turned off with `spark.security.credentials.kafka.enabled`. 
When
+  `spark.kafka.bootstrap.servers` set Spark looks for authentication 
information in the following
+  order and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Spark supports the following authentication protocols to obtain token:
+  - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for 
authentication and SSL for encryption.
+  - **SSL**: It's leveraging a capability from SSL called 2-way 
authentication. The server authenticates
+clients through certificates. Please note 2-way authentication must be 
enabled on Kafka brokers.
+  - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos 
used for authentication but
+because there is no encryption it's only for testing purposes.
+
+  After obtaining delegation token successfully, Spark spreads it across 
nodes and renews it accordingly.
--- End diff --

s/spreads/distributes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23037
  
Actually I lied.

Could you update the `create_dev_build_context` function in 
`docker-image-tool.sh` to copy this new directory? You can run the script from 
your build directory to test it.

You should also revert the change to the python it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23037
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23181
  
This didn't merge cleanly to 2.4, please open a PR against that branch if 
you want it there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23181
  
Merging to master / 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
I filed SPARK-26239.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
> A proposed scheme is to have 
spark.authenticate.k8s.secret.provider=autok8ssecret

If you're going to add a different way to get the auth secret later, then 
you can introduce that option with a default value. It does not mean it needs 
to be done *in this change*, which is my point.

The only real change being introduced here form the Spark status quo is 
that you don't need to provide your own auth secret in the configuration (i.e. 
it would work like YARN), which doesn't even work in k8s today because that is 
not propagated to the executors in any way. And if you think that is a problem 
I can gate the generation of the secret based on whether one is already defined 
in the config.

> if it's not introduced in this patch, then at least we should file JIRA 
tickets 

That is fine with me. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23189: [SPARK-26235][Core] Change log level for ClassNotFoundEx...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23189
  
Sounds ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
> The way it's written now

Code can change after it's written...

>  If this change is merged into 3.x without any other changes, users will 
be forced to use the K8s secret based 

If this change is not merged, users have no way to use authentication at 
all. So I don't see your point.

It will prevent you from using the Vault approach in the sense that support 
for that won't be implemented yet. But this change, again, does not put a block 
on adding support for what you're saying later. If you think that's important 
and want to do that in 3.x, then you're free to. But I don't see why this 
approach needs to be blocked because it doesn't cover the Vault use case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23189: [SPARK-26235][Core] Change log level for ClassNotFoundEx...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23189
  
> The logWarning call as the other handler below is also not overridden:

It is. I even copied & pasted the code. I made the change locally and this 
is what happens:

```
$ ./bin/spark-submit --class foo bar.jar 
Warning: Failed to load foo: foo
```

With your change, if you are using a file appender instead of a console 
appender, you'll still not see the error on the terminal. For `spark-submit`, 
since it's a user-run script, it's good to print something to the terminal in 
these cases, regardless of the logging configuration.

If you want to also write the full exception to the logs, that's fine, but 
the change I suggested is more correct here considering the use.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
> while leaving it an exercise for the reader to understand how to properly 
run spark such that the secrets are actually secured.

I don't think that's an exercise for the user, but for the admin. If the 
admin has configured things properly, the user will be able to deploy secure 
applications without issue. This is not just the case in kubernetes. It's the 
case in any deployment.

Even your example of using Vault requires that. There's work needed to use 
it properly and securely.

> avoid coupling the API for how spark containers read secrets

There doesn't need to be a single solution. This patch going in does not 
preclude adding more features later, one of which might be reading this from a 
pre-defined secret.

And in your example I don't see how Vault would actually solve the 
client-mode driver case. There's no kubernetes involved in the driver, which 
may be running anywhere, and it needs to know the secret.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...

2018-11-30 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23181#discussion_r237952522
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
@@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite 
with BeforeAndAfter {
   }
 
   test("SPARK-25451: total tasks in the executor summary should match 
total stage tasks") {
-val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue)
 
-val listener = new AppStatusListener(store, testConf, true)
+val isLiveSeq = Seq(true, false)
 
-val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details")
-listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null))
-listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new 
Properties()))
+isLiveSeq.foreach { live: Boolean =>
+  val testConf = if (live) {
+conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue)
--- End diff --

nit: `clone()`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...

2018-11-30 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23181#discussion_r237952479
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
@@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite 
with BeforeAndAfter {
   }
 
   test("SPARK-25451: total tasks in the executor summary should match 
total stage tasks") {
-val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue)
 
-val listener = new AppStatusListener(store, testConf, true)
+val isLiveSeq = Seq(true, false)
 
-val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details")
-listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null))
-listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new 
Properties()))
+isLiveSeq.foreach { live: Boolean =>
--- End diff --

When doing things like this I prefer to invert the logic.

```
Seq(true, false).foreach { live =>
  test(s"blah blah blah (live = $live)") {

  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23189: [SPARK-26235][Core] Change log level for ClassNotFoundEx...

2018-11-30 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23189
  
Actually the problem is here:

```
  case e: ClassNotFoundException =>
logWarning(s"Failed to load $childMainClass.", e)
```

That particular `logWarning` is not overridden in the SparkSubmit object:

```
  override def main(args: Array[String]): Unit = {
val submit = new SparkSubmit() {
  ...
  override protected def logWarning(msg: => String): Unit = 
printMessage(s"Warning: $msg")
```

Probably should instead use the same `logWarning` call as the other handler 
below:

```
  case e: NoClassDefFoundError =>
logWarning(s"Failed to load $childMainClass: ${e.getMessage()}")
```

Which should then always print the error regardless of your log4j 
configuration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Skips Python resource limit...

2018-11-30 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237941472
  
--- Diff: docs/configuration.md ---
@@ -190,6 +190,8 @@ of the most common options to set are:
 and it is up to the application to avoid exceeding the overhead memory 
space
 shared with other non-JVM processes. When PySpark is run in YARN or 
Kubernetes, this memory
 is added to executor resource requests.
+
+NOTE: This configuration is not supported on Windows.
--- End diff --

This is a little misleading since on Windows the extra memory will still be 
added to the resource requests. It's just the process-level memory limit that 
is not implemented.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22598
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237723350
  
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
 import os
 import sys
 import time
-import resource
+# 'resource' is a Unix specific module.
+has_resource_module = True
+try:
+import resource
+except ImportError:
+has_resource_module = False
--- End diff --

I think it's way better to have the JVM not care and have the python side 
handle the setting according to its capabilities.

It's a smaller change, it's more localized, and it reduces the amount of 
changes if this needs to be tweaked again in the future.

I don't see the consistency argument you're making. You picked one example 
that is very different from this one, since it actually affects the behavior of 
the JVM code. Could it have been developed differently? Of course. But we're 
not discussing that feature here.

And there are so few of these cases that you really can't draw a pattern 
from them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237716326
  
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
 import os
 import sys
 import time
-import resource
+# 'resource' is a Unix specific module.
+has_resource_module = True
+try:
+import resource
+except ImportError:
+has_resource_module = False
--- End diff --

I really don't understand what you said above. If the check is done on the 
Python side, what is the exact reason why you need it on the JVM side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237714454
  
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
 import os
 import sys
 import time
-import resource
+# 'resource' is a Unix specific module.
+has_resource_module = True
+try:
+import resource
+except ImportError:
+has_resource_module = False
--- End diff --

What is the behavior you want? What do you mean "python can't fail"?

If the memory limit is set by the user, you have two options when the 
python side cannot do anything with it:

- ignore it (the current patch)
- raise an error

Both can be done from the python side and do not require any checks in the 
JVM.

Checking in the JVM is the wrong thing. You're hardcoding the logic that 
this feature can never work on Windows. And if someone finds out that it can, 
then you'll have to change it, and that check was basically useless, since an 
equivalent check exists on the python side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237715129
  
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
 import os
 import sys
 import time
-import resource
+# 'resource' is a Unix specific module.
+has_resource_module = True
+try:
+import resource
+except ImportError:
+has_resource_module = False
--- End diff --

> Let me just follow majority

A -1 is a -1. It's not majority based (this is not a release).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237715021
  
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
 import os
 import sys
 import time
-import resource
+# 'resource' is a Unix specific module.
+has_resource_module = True
+try:
+import resource
+except ImportError:
+has_resource_module = False
--- End diff --

> For instance, forking in Python is disallowed in Python on Windows

That is *completely* different. Because that affects how the Java side 
talks to the Python side.

The feature in this PR has no bearing on what happens on the JVM side at 
all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r237659299
  
--- Diff: core/src/main/scala/org/apache/spark/internal/config/Kafka.scala 
---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal.config
+
+private[spark] object Kafka {
+
+  private[spark] val BOOTSTRAP_SERVERS =
--- End diff --

modifiers are now redundant since this is an object (and not a package 
object).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23037
  
When I tried to write automated tests for pyspark in the past it was kind 
of a pain. It doesn't work the way you expect unless you have a 
pseudo-terminal, apparently.

Maybe try to write a test script around it using `pty.spawn` or `pty.fork`. 
But to be frank, that sounds more complicated than it's worth...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r237620523
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/KafkaTokenUtil.scala ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.security
+
+import java.{ util => ju }
+import java.text.SimpleDateFormat
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.token.{Token, TokenIdentifier}
+import 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier
+import org.apache.kafka.clients.CommonClientConfigs
+import org.apache.kafka.clients.admin.{AdminClient, 
CreateDelegationTokenOptions}
+import org.apache.kafka.common.config.SaslConfigs
+import org.apache.kafka.common.security.JaasContext
+import 
org.apache.kafka.common.security.auth.SecurityProtocol.{SASL_PLAINTEXT, 
SASL_SSL, SSL}
+import org.apache.kafka.common.security.token.delegation.DelegationToken
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+
+private[spark] object KafkaTokenUtil extends Logging {
+  val TOKEN_KIND = new Text("KAFKA_DELEGATION_TOKEN")
+  val TOKEN_SERVICE = new Text("kafka.server.delegation.token")
+
+  private[spark] class KafkaDelegationTokenIdentifier extends 
AbstractDelegationTokenIdentifier {
+override def getKind: Text = TOKEN_KIND
+  }
+
+  private[security] def obtainToken(sparkConf: SparkConf): (Token[_ <: 
TokenIdentifier], Long) = {
+val adminClient = 
AdminClient.create(createAdminClientProperties(sparkConf))
+val createDelegationTokenOptions = new CreateDelegationTokenOptions()
+val createResult = 
adminClient.createDelegationToken(createDelegationTokenOptions)
+val token = createResult.delegationToken().get()
+printToken(token)
+
+(new Token[KafkaDelegationTokenIdentifier](
+  token.tokenInfo.tokenId.getBytes,
+  token.hmacAsBase64String.getBytes,
+  TOKEN_KIND,
+  TOKEN_SERVICE
+), token.tokenInfo.expiryTimestamp)
+  }
+
+  private[security] def createAdminClientProperties(sparkConf: SparkConf): 
ju.Properties = {
+val adminClientProperties = new ju.Properties
+
+val bootstrapServers = sparkConf.get(KAFKA_BOOTSTRAP_SERVERS)
+require(bootstrapServers.nonEmpty, s"Tried to obtain kafka delegation 
token but bootstrap " +
+  "servers not configured.")
+
adminClientProperties.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers.get)
+
+val protocol = sparkConf.get(KAFKA_SECURITY_PROTOCOL)
+
adminClientProperties.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, 
protocol)
+protocol match {
+  case SASL_SSL.name =>
+setTrustStoreProperties(sparkConf, adminClientProperties)
+
+  case SSL.name =>
+setTrustStoreProperties(sparkConf, adminClientProperties)
+setKeyStoreProperties(sparkConf, adminClientProperties)
+logWarning("Obtaining kafka delegation token with SSL protocol. 
Please " +
+  "configure 2-way authentication on the broker side.")
+
+  case SASL_PLAINTEXT.name =>
+logWarning("Obtaining kafka delegation token through plain 
communication channel. Please " +
+  "consider the security impact.")
+}
+
+// There are multiple possibilities to log in and applied in the 
following order:
--- End diff --

Mention the Kafka bug here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r237620333
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -688,4 +688,65 @@ package object config {
   .stringConf
   .toSequence
   .createWithDefault(Nil)
+
+  private[spark] val KAFKA_BOOTSTRAP_SERVERS =
--- End diff --

This pattern was added after you PR, but since it's there... might be 
better to put these into a new file (e.g. `Kafka.scala`) in this package, 
instead of adding more stuff to this file. (e.g. see `History.scala`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r237622443
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelperSuite.scala
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.UUID
+
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.apache.hadoop.security.token.Token
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.security.KafkaTokenUtil
+import 
org.apache.spark.deploy.security.KafkaTokenUtil.KafkaDelegationTokenIdentifier
+import org.apache.spark.internal.config.{KAFKA_KERBEROS_SERVICE_NAME}
+
+class KafkaSecurityHelperSuite extends SparkFunSuite with 
BeforeAndAfterEach {
+  private val keytab = "/path/to/keytab"
+  private val kerberosServiceName = "kafka"
+  private val principal = "u...@domain.com"
+  private val tokenId = "tokenId" + UUID.randomUUID().toString
+  private val tokenPassword = "tokenPassword" + UUID.randomUUID().toString
+
+  private var sparkConf: SparkConf = null
+
+  override def beforeEach(): Unit = {
+super.beforeEach()
+sparkConf = new SparkConf()
+  }
+
+  override def afterEach(): Unit = {
+try {
+  resetUGI
+} finally {
+  super.afterEach()
+}
+  }
+
+  private def addTokenToUGI(): Unit = {
+val token = new Token[KafkaDelegationTokenIdentifier](
+  tokenId.getBytes,
+  tokenPassword.getBytes,
+  KafkaTokenUtil.TOKEN_KIND,
+  KafkaTokenUtil.TOKEN_SERVICE
+)
+val creds = new Credentials()
+creds.addToken(KafkaTokenUtil.TOKEN_SERVICE, token)
+UserGroupInformation.getCurrentUser.addCredentials(creds)
+  }
+
+  private def resetUGI: Unit = {
--- End diff --

add `()`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23151#discussion_r237617706
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -494,13 +494,12 @@ class SparkSubmitSuite
   }
 
   test("launch simple application with spark-submit with redaction") {
-val testDir = Utils.createTempDir()
-testDir.deleteOnExit()
-val testDirPath = new Path(testDir.getAbsolutePath())
 val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
 val fileSystem = Utils.getHadoopFileSystem("/",
   SparkHadoopUtil.get.newConfiguration(new SparkConf()))
-try {
+withTempDir { testDir =>
+  testDir.deleteOnExit()
--- End diff --

`deleteOnExit` is redundant for directories created by 
`Utils.createTempDir`. That method already tracks directories to cleanup on 
exit.

(Also `deleteOnExit` does not work for non-empty directories.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
(In fact, env variables don't even show up in the UI or event logs, as far 
as I can see. Other configs - Spark config, system properties, e.g. - do show 
up, and are redacted to mask secrets.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23017: [SPARK-26015][K8S] Set a default UID for Spark on K8S Im...

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23017
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23158
  
Merging to master / 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
>  if the secret would be listed under the environment variables in the 
Spark UI 

Secrets are redacted in the UI and event logs. We already use env variables 
in other contexts (e.g. standalone with auth enabled).

Environment variables don't leak unless you leak them. If you do, it's a 
security problem in your code, since the env is generally considered "sensitive 
information". They're not written to disk, unlike files, which some people have 
problems with (really paranoid orgs don't want sensitive information in 
unencrypted files on disk).

This could be stashed in a k8s secret, but then how does the client mode 
driver get it? More user configuration? That's exactly what this is trying to 
avoid.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22959: [SPARK-25876][k8s] Simplify kubernetes configurat...

2018-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22959#discussion_r237587099
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -112,125 +72,139 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
+private[spark] class KubernetesDriverConf(
+sparkConf: SparkConf,
+val appId: String,
+val mainAppResource: MainAppResource,
+val mainClass: String,
+val appArgs: Array[String],
+val pyFiles: Seq[String])
+  extends KubernetesConf(sparkConf) {
+
+  override val resourceNamePrefix: String = {
+val custom = if (Utils.isTesting) 
get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None
--- End diff --

I'm trying to avoid creating custom test classes here (that's what I 
understand by "inject", since there's no way to "inject" this otherwise). 
There's really a single test that needs this functionality, IIRC, and this 
pattern is way more common in Spark than what you're suggesting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23174
  
> via a mounted file
> Also the user should be able to specify their own mounted file

The point is that the user shouldn't need to set this at all. You enable 
auth, Spark takes care of it. There's no point in specifying a pre-defined file 
with the secret - in fact that makes things *less* secure, because you'd be 
reusing the secret in different apps.

> Some users would prefer to avoid propagating sensitive information via 
environment variables

Why? And how are mounted files better?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23158#discussion_r237293492
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 assert(!log2.exists())
   }
 
+  test("should not clean inprogress application with lastUpdated time less 
the maxTime") {
+val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1)
+val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6)
+val maxAge = TimeUnit.DAYS.toMillis(7)
+val clock = new ManualClock(0)
+val provider = new FsHistoryProvider(
+  createTestConf().set("spark.history.fs.cleaner.maxAge", 
s"${maxAge}ms"), clock)
+val log = newLogFile("inProgressApp1", None, inProgress = true)
+writeFile(log, true, None,
+  SparkListenerApplicationStart(
+"inProgressApp1", Some("inProgressApp1"), 3L, "test", 
Some("attempt1"))
+)
+clock.setTime(firstFileModifiedTime)
+provider.checkForLogs()
--- End diff --

Perhaps. Better to be consistent with other tests. Also because you're 
using a manual clock, and otherwise your mod times will be way higher than the 
clock's time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23058: [SPARK-25905][CORE] When getting a remote block, ...

2018-11-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23058#discussion_r237284719
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -718,13 +718,9 @@ private[spark] class BlockManager(
   }
 
   /**
-   * Get block from remote block managers as serialized bytes.
+   * Get block from remote block managers as a ManagedBuffer.
*/
-  def getRemoteBytes(blockId: BlockId): Option[ChunkedByteBuffer] = {
-// TODO SPARK-25905 if we change this method to return the 
ManagedBuffer, then getRemoteValues
-// could just use the inputStream on the temp file, rather than 
reading the file into memory.
-// Until then, replication can cause the process to use too much 
memory and get killed
-// even though we've read the data to disk.
+  def getRemoteManagedBuffer(blockId: BlockId): Option[ManagedBuffer] = {
--- End diff --

Can this be private?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23158#discussion_r237280881
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 assert(!log2.exists())
   }
 
+  test("should not clean inprogress application with lastUpdated time less 
the maxTime") {
+val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1)
+val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6)
+val maxAge = TimeUnit.DAYS.toMillis(7)
+val clock = new ManualClock(0)
+val provider = new FsHistoryProvider(
+  createTestConf().set("spark.history.fs.cleaner.maxAge", 
s"${maxAge}ms"), clock)
+val log = newLogFile("inProgressApp1", None, inProgress = true)
+writeFile(log, true, None,
+  SparkListenerApplicationStart(
+"inProgressApp1", Some("inProgressApp1"), 3L, "test", 
Some("attempt1"))
+)
+clock.setTime(firstFileModifiedTime)
+provider.checkForLogs()
--- End diff --

You need to set the log file's modified time before calling this, otherwise 
the cleaner won't be checking what you expect.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23158#discussion_r237280935
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 assert(!log2.exists())
   }
 
+  test("should not clean inprogress application with lastUpdated time less 
the maxTime") {
+val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1)
+val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6)
+val maxAge = TimeUnit.DAYS.toMillis(7)
+val clock = new ManualClock(0)
+val provider = new FsHistoryProvider(
+  createTestConf().set("spark.history.fs.cleaner.maxAge", 
s"${maxAge}ms"), clock)
--- End diff --

Use config constant.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23172: [SPARK-25957][followup] Build python docker image...

2018-11-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23172#discussion_r237276972
  
--- Diff: project/SparkBuild.scala ---
@@ -494,7 +494,12 @@ object KubernetesIntegrationTests {
 dockerBuild := {
   if (shouldBuildImage) {
 val dockerTool = s"$sparkHome/bin/docker-image-tool.sh"
-val cmd = Seq(dockerTool, "-m", "-t", imageTag.value, "build")
+val bindingsDir = 
s"$sparkHome/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings"
+val cmd = Seq(dockerTool, "-m",
+  "-t", imageTag.value,
+  "-p", s"$bindingsDir/python/Dockerfile",
--- End diff --

Not yet (ITs for R are not run). But shouldn't hurt either.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-11-28 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/23174

[SPARK-26194][k8s] Auto generate auth secret for k8s apps.

This change modifies the logic in the SecurityManager to do two
things:

- generate unique app secrets also when k8s is being used
- only store the secret in the user's UGI on YARN

The latter is needed so that k8s won't unnecessarily create
k8s secrets for the UGI credentials when only the auth token
is stored there.

On the k8s side, the secret is propagated to executors using
an environment variable instead. This ensures it works in both
client and cluster mode.

Security doc was updated to mention the feature and clarify that
proper access control in k8s should be enabled for it to be secure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-26194

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23174


commit 0e36a4bb4a5a1ad9abee7e003b7d5f3588cba126
Author: Marcelo Vanzin 
Date:   2018-11-16T23:21:00Z

[SPARK-26194][k8s] Auto generate auth secret for k8s apps.

This change modifies the logic in the SecurityManager to do two
things:

- generate unique app secrets also when k8s is being used
- only store the secret in the user's UGI on YARN

The latter is needed so that k8s won't unnecessarily create
k8s secrets for the UGI credentials when only the auth token
is stored there.

On the k8s side, the secret is propagated to executors using
an environment variable instead. This ensures it works in both
client and cluster mode.

Security doc was updated to mention the feature and clarify that
proper access control in k8s should be enabled for it to be secure.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >