Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r192448946 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -102,17 +110,30 @@ private[spark] object KubernetesConf { appId: String, mainAppResource: Option[MainAppResource], mainClass: String, - appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = { + appArgs: Array[String], + maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = { val sparkConfWithMainAppJar = sparkConf.clone() + val additionalFiles = mutable.ArrayBuffer.empty[String] mainAppResource.foreach { - case JavaMainAppResource(res) => - val previousJars = sparkConf - .getOption("spark.jars") - .map(_.split(",")) - .getOrElse(Array.empty) - if (!previousJars.contains(res)) { - sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res)) - } + case JavaMainAppResource(res) => + val previousJars = sparkConf + .getOption("spark.jars") + .map(_.split(",")) + .getOrElse(Array.empty) + if (!previousJars.contains(res)) { + sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res)) + } + // The function of this outer match is to account for multiple nonJVM + // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 0.4 + case nonJVM: NonJVMResource => + nonJVM match { + case PythonMainAppResource(res) => + additionalFiles += res + maybePyFiles.foreach{maybePyFiles => + additionalFiles.appendAll(maybePyFiles.split(","))} + sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res) + } + sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4) --- End diff -- Yup, you can see my statement about not overriding the explicitly user provided value in comment on the 20th ("if the user has specified a different value don't think we should override it"). So this logic, as it stands, is K8s specific and I don't think we we can change how YARN chooses its memory overhead in a minor release, so I'd expect this to remain K8s specific until at least 3.0 when we can evaluate if we want to change this in YARN as well. The memory overhead configuration notice done in the YARN page right now (see `spark.yarn.am.memoryOverhead` on http://spark.apache.org/docs/latest/running-on-yarn.html ). So I would document this in http://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties e.g. `./docs/running-on-kubernetes.md`). As for intuitive I'd argue that this actually is more intuitive than what we do in YARN, we know that users who run R & Python need more non-JVM heap space and many users don't know to think about this until their job fails. We can take advantage of our knowledge to handle this setting for the user more often. You can see how often this confuses folks on the list, docs, and stack overflow by looking at "memory overhead exceeded" and "Container killed by YARN for exceeding memory limits" and similar.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org