[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

vanzin Tue, 30 Oct 2018 11:46:02 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/22897


    [SPARK-25875][k8s] Merge code to set up driver command into a single step.

    Right now there are 3 different classes dealing with building the driver
    command to run inside the pod, one for each "binding" supported by Spark.
    This has two main shortcomings:
    
    - the code in the 3 classes is very similar; changing things in one place
      would probably mean making a similar change in the others.
    
    - it gives the false impression that the step implementation is the only
      place where binding-specific logic is needed. That is not true; there
      was code in KubernetesConf that was binding-specific, and there's also
      code in the executor-specific config step. So the 3 classes weren't really
      working as a language-specific abstraction.
    
    On top of that, the current code was propagating command line parameters in
    a different way depending on the binding. That doesn't seem necessary, and
    in fact using environment variables for command line parameters is in 
general
    a really bad idea, since you can't handle special characters (e.g. spaces)
    that way.
    
    This change merges the 3 different code paths for Java, Python and R into
    a single step, and also merges the 3 code paths to start the Spark driver
    in the k8s entry point script. This increases the amount of shared code,
    and also moves more feature logic into the step itself, so it doesn't live
    in KubernetesConf.
    
    Note that not all logic related to setting up the driver lives in that
    step. For example, the memory overhead calculation still lives separately,
    except it now happens in the driver config step instead of outside the
    step hierarchy altogether.
    
    Some of the noise in the diff is because of changes to KubernetesConf, which
    will be addressed in a separate change.
    
    Tested with new and updated unit tests + integration tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-25875

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22897.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22897
    
----
commit 8148f49e31b30edaa03973198042cb044df46cae
Author: Marcelo Vanzin <vanzin@...>
Date:   2018-10-29T23:28:36Z

    [SPARK-25875][k8s] Merge code to set up driver command into a single step.
    
    Right now there are 3 different classes dealing with building the driver
    command to run inside the pod, one for each "binding" supported by Spark.
    This has two main shortcomings:
    
    - the code in the 3 classes is very similar; changing things in one place
      would probably mean making a similar change in the others.
    
    - it gives the false impression that the step implementation is the only
      place where binding-specific logic is needed. That is not true; there
      was code in KubernetesConf that was binding-specific, and there's also
      code in the executor-specific config step. So the 3 classes weren't really
      working as a language-specific abstraction.
    
    On top of that, the current code was propagating command line parameters in
    a different way depending on the binding. That doesn't seem necessary, and
    in fact using environment variables for command line parameters is in 
general
    a really bad idea, since you can't handle special characters (e.g. spaces)
    that way.
    
    This change merges the 3 different code paths for Java, Python and R into
    a single step, and also merges the 3 code paths to start the Spark driver
    in the k8s entry point script. This increases the amount of shared code,
    and also moves more feature logic into the step itself, so it doesn't live
    in KubernetesConf.
    
    Note that not all logic related to setting up the driver lives in that
    step. For example, the memory overhead calculation still lives separately,
    except it now happens in the driver config step instead of outside the
    step hierarchy altogether.
    
    Some of the noise in the diff is because of changes to KubernetesConf, which
    will be addressed in a separate change.
    
    Tested with new and updated unit tests + integration tests.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

Reply via email to