Marcelo Vanzin created SPARK-25920:
--------------------------------------

             Summary: Avoid custom processing of CLI options for cluster 
submission
                 Key: SPARK-25920
                 URL: https://issues.apache.org/jira/browse/SPARK-25920
             Project: Spark
          Issue Type: Improvement
          Components: Spark Submit
    Affects Versions: 3.0.0
            Reporter: Marcelo Vanzin


In {{SparkSubmit}}, when an app is being submitted in cluster mode, there is 
currently a lot of code specific to each resource manager to take the 
{{SparkSubmit}} internals, package them up in a rm-specific set of "command 
line options", and parse them back into memory when the rm-specific class is 
invoked.

e.g. for YARN

{code}
    // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
    if (isYarnCluster) {
      childMainClass = YARN_CLUSTER_SUBMIT_CLASS
      if (args.isPython) {
        childArgs += ("--primary-py-file", args.primaryResource)
        childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
  [blah blah blah]
{code}

For Mesos:

{code}
    if (isMesosCluster) {
      assert(args.useRest, "Mesos cluster mode is only supported through the 
REST submission API")
      childMainClass = REST_CLUSTER_SUBMIT_CLASS
      if (args.isPython) {
        // Second argument is main class
        childArgs += (args.primaryResource, "")
        if (args.pyFiles != null) {
          sparkConf.set("spark.submit.pyFiles", args.pyFiles)
        }
  [blah blah blah]
{code}


For k8s:

{code}
    if (isKubernetesCluster) {
      childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
      if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
        if (args.isPython) {
          childArgs ++= Array("--primary-py-file", args.primaryResource)
          childArgs ++= Array("--main-class", 
"org.apache.spark.deploy.PythonRunner")
  [blah blah blah]
{code}

These parts of the code are all very similar and there's not a good reason for 
why each RM needs specific processing here. We should try to simplify all this 
stuff and pass pre-parsed command line options to the cluster submission 
classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to