[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

mccheah Fri, 01 Jun 2018 10:05:51 -0700

Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192457908
  
    --- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
    @@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        // The function of this outer match is to account for multiple 
nonJVM
    +        // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 
0.4
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              
sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    I think that power users would want the ability to try to overwrite this if 
they have a specific amount of memory overhead that they want and know that 
they need. Configurations should always be configurable, with defaults that are 
sane. I agree that we can afford to set a better default for Kubernetes, but 
there should always be a way to override default settings if the user knows the 
characteristics of their job. For example if the user does memory profiling of 
their container and sees that it's not using the full amount of memory, they 
can afford to drop this value and leave more resources for other applications.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Reply via email to