[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499140#comment-16499140 ]
Stavros Kontopoulos edited comment on SPARK-24434 at 6/2/18 6:17 PM: --------------------------------------------------------------------- I agree with [~felixcheung] and [~liyinan926]. From the design doc for affinity stuff it was quickly obvious that things will only get more complex if we try to map yaml to Spark conf and you also loose some power with that mapping. In the past I have used json in a production system for passing config options to Spark Jobs. It was proven to be fine for one good reason, which was json schema validation that would validate also business properties early enough, for example is a number in valid range? This was pretty cool to fail fast. The tedious thing was developing with json libs (I ended up using jakson btw). Here though I would start with the most simple solution at least from an architectural perspective. That is point to the yaml spec. Yaml is the way config is specified to K8s pods so I would start with that. Regarding precedence, semantically yaml options are just spark options and precedence is defined by the order Spark config sees in general spark options specified either in the properties file or as Java properties etc. The format should't violate the precedence. Yaml and java properties are not exactly equivalent though, as yaml is more expressive when it comes to complex structures, at least it makes your life easier. For example on mesos in order to specify multiple secrets you need to specify them with [commas|https://spark.apache.org/docs/latest/running-on-mesos.html] and order matters. Also commas cant be part of the name. Of course yaml is not Spark-like but K8s is a sophisticated deployment env anyway. So one question that comes here is all this infrastructure configuration properties may not belong to Spark anyway (another way to view the whole problem). There was a long discussion for resource managers to be out of the upstream project, but was blocked due to the required changes for a common API. Thus from that angle the problem is a bit easier to solve, properties dont need to be semantically the same with Spark config options and could be manager specific. was (Author: skonto): I agree with [~felixcheung] and [~liyinan926]. From the design doc for affinity stuff it was quickly obvious that things will only get more complex if we try to map yaml to Spark conf and you also loose some power with that mapping. In the past I have used json in a production system for passing config options to Spark Jobs. It was proven to be fine for one good reason, which was json schema validation that would validate also business properties early enough, for example is a number in valid range? This was pretty cool to fail fast. The tedious thing was developing with json libs (I ended up using jakson btw). Here though I would start with the most simple solution at least from an architectural perspective. That is point to the yaml spec. Yaml is the way config is specified to K8s pods so I would start with that. Regarding precedence, semantically yaml options are just spark options and precedence is defined by the order Spark config sees in general spark options specified either in the properties file or as Java properties etc. The format should't violate the precedence. Yaml and java properties are not exactly equivalent though, as yaml is more expressive when it comes to complex structures, at least it makes your life easier. For example on mesos in order to specify multiple secrets you need to specify them with [commas|https://spark.apache.org/docs/latest/running-on-mesos.html] and order matters. Also commas cant be part of the name. Of course yaml is not Spark-like but K8s is a sophisticated deployment env anyway. One question would be why other envs dont have this requirements. > Support user-specified driver and executor pod templates > -------------------------------------------------------- > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes > Affects Versions: 2.4.0 > Reporter: Yinan Li > Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org