[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR workers

ASF GitHub Bot (Jira) Mon, 20 Nov 2023 16:20:58 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SPARK-44767:
-----------------------------------
    Labels: pull-request-available  (was: )

> Plugin API for PySpark and SparkR workers
> -----------------------------------------
>
>                 Key: SPARK-44767
>                 URL: https://issues.apache.org/jira/browse/SPARK-44767
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.4.1
>            Reporter: Willi Raschkowski
>            Priority: Major
>              Labels: pull-request-available
>
> An API to customize Python and R workers allows for extensibility beyond what 
> can be expressed via static configs and environment variables like, e.g., 
> {{spark.pyspark.python}}.
> A use case for this is overriding {{PATH}} when using {{spark.archives}} 
> with, say, conda-pack (as documented 
> [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]).
>  Some packages rely on binaries. And if we want to use those packages in 
> Spark, we need to include their binaries in the {{PATH}}.
> But we can't set the {{PATH}} via some config because 1) the environment with 
> its binaries may be at a dynamic location (archives are unpacked on the 
> driver [into a directory with random 
> name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]),
>  and 2) we may not want to override the {{PATH}} that's pre-configured on the 
> hosts.
> Other use cases unlocked by this include overriding the executable 
> dynamically (e.g., to select a version) or forking/redirecting the worker's 
> output stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR workers

Reply via email to