[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499140#comment-16499140
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 6/2/18 6:17 PM:
---------------------------------------------------------------------

I agree with [~felixcheung] and [~liyinan926]. From the design doc for affinity 
stuff it was quickly obvious that things will only get more complex if we try 
to map yaml to Spark conf and you also loose some power with that mapping.

In the past I have used json in a production system for passing config options 
to Spark Jobs. It was proven to be fine for one good reason, which was json 
schema validation that would validate also business properties early enough, 
for example is a number in valid range? This was pretty cool to fail fast.

The tedious thing was developing with json libs (I ended up using jakson btw).

Here though I would start with the most simple solution at least from an 
architectural perspective. That is point to the yaml spec. 

Yaml is the way config is specified to K8s pods so I would start with that. 
Regarding precedence, semantically yaml options are just spark options and 
precedence is defined by the order Spark config sees in general spark options 
specified either in the properties file or as Java properties etc. The format 
should't violate the precedence. Yaml and java properties are not exactly 
equivalent though, as yaml is more expressive when it comes to complex 
structures, at least it makes your life easier. For example on mesos in order 
to specify multiple secrets you need to specify them with 
[commas|https://spark.apache.org/docs/latest/running-on-mesos.html] and order 
matters. Also commas cant be part of the name. 

Of course yaml is not Spark-like but K8s is a sophisticated deployment env 
anyway.

So one question that comes here is all this infrastructure configuration 
properties may not belong to Spark anyway (another way to view the whole 
problem). There was a long discussion for resource managers to be out of the 
upstream project, but was blocked due to the required changes for a common API. 
Thus from that angle the problem is a bit easier to solve, properties dont need 
to be semantically the same with Spark config options and could be manager 
specific. 

 


was (Author: skonto):
I agree with [~felixcheung] and [~liyinan926]. From the design doc for affinity 
stuff it was quickly obvious that things will only get more complex if we try 
to map yaml to Spark conf and you also loose some power with that mapping.

In the past I have used json in a production system for passing config options 
to Spark Jobs. It was proven to be fine for one good reason, which was json 
schema validation that would validate also business properties early enough, 
for example is a number in valid range? This was pretty cool to fail fast.

The tedious thing was developing with json libs (I ended up using jakson btw).

Here though I would start with the most simple solution at least from an 
architectural perspective. That is point to the yaml spec. 

Yaml is the way config is specified to K8s pods so I would start with that. 
Regarding precedence, semantically yaml options are just spark options and 
precedence is defined by the order Spark config sees in general spark options 
specified either in the properties file or as Java properties etc. The format 
should't violate the precedence. Yaml and java properties are not exactly 
equivalent though, as yaml is more expressive when it comes to complex 
structures, at least it makes your life easier. For example on mesos in order 
to specify multiple secrets you need to specify them with 
[commas|https://spark.apache.org/docs/latest/running-on-mesos.html] and order 
matters. Also commas cant be part of the name. 

Of course yaml is not Spark-like but K8s is a sophisticated deployment env 
anyway.

One question would be why other envs dont have this requirements.

 

> Support user-specified driver and executor pod templates
> --------------------------------------------------------
>
>                 Key: SPARK-24434
>                 URL: https://issues.apache.org/jira/browse/SPARK-24434
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>            Reporter: Yinan Li
>            Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to