[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-44767: ----------------------------------- Labels: pull-request-available (was: ) > Plugin API for PySpark and SparkR workers > ----------------------------------------- > > Key: SPARK-44767 > URL: https://issues.apache.org/jira/browse/SPARK-44767 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 3.4.1 > Reporter: Willi Raschkowski > Priority: Major > Labels: pull-request-available > > An API to customize Python and R workers allows for extensibility beyond what > can be expressed via static configs and environment variables like, e.g., > {{spark.pyspark.python}}. > A use case for this is overriding {{PATH}} when using {{spark.archives}} > with, say, conda-pack (as documented > [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). > Some packages rely on binaries. And if we want to use those packages in > Spark, we need to include their binaries in the {{PATH}}. > But we can't set the {{PATH}} via some config because 1) the environment with > its binaries may be at a dynamic location (archives are unpacked on the > driver [into a directory with random > name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), > and 2) we may not want to override the {{PATH}} that's pre-configured on the > hosts. > Other use cases unlocked by this include overriding the executable > dynamically (e.g., to select a version) or forking/redirecting the worker's > output stream. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org