Hi Spark devs,

I would like to call for *a new vote following the previous attempt* for the
*SPIP: Language-Agnostic UDF Execution Protocol for Spark *after addressing
comments and providing a supplementary design document for worker
specification.

The SPIP proposes a structured, language-agnostic framework for running
user-defined functions (UDFs) in Spark across multiple programming languages

Today, Spark Connect allows users to write queries from multiple languages,
but support for user-defined functions remains incomplete. In practice,
only Scala, Java, Python have working support, and this relies on
language-specific mechanisms that do not generalize well to other languages
such as Go <https://github.com/apache/spark-connect-go> / Rust
<https://github.com/apache/spark-connect-rust> / Swift
<https://github.com/apache/spark-connect-swift> / TypeScript
<https://github.com/BaldrVivaldelli/ts-spark-connector>  where UDF support
is currently unavailable. In addition, there are legacy limitations in the
existing PySpark worker implementation that make it difficult to evolve the
system or extend it to new languages.

The proposal introduces two related components:


   1.

   *A unified UDF execution protocol*

   The proposal defines a structured API and execution protocol for running
   UDFs outside the Spark executor process and communicating with Spark via
   inter-process communication (IPC). This protocol enables Spark to interact
   with external UDF workers in a consistent and extensible way, regardless of
   the implementation language.
   2.

   *A worker specification for provisioning and lifecycle management.*

   To support multi-language execution environments, the proposal also
   introduces a worker specification describing how UDF workers can be
   installed, started, connected to, and terminated. This document complements
   the SPIP by outlining how workers can be provisioned and managed in a
   consistent way.

Note that this SPIP can help enable UDF support for languages that
currently do not support UDFs. For languages that already have UDF
implementations (especially Python), the goal is not to replace existing
implementations immediately, but to provide a framework that may allow them
to gradually evolve toward more language-agnostic abstractions over time.

More details can be found in the SPIP document and the supplementary design
for worker specification:

SPIP:
https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8

Worker specification design document:
https://docs.google.com/document/d/1Dx9NqHRNuUpatH9DYoFF9cmvUl2fqHT4Rjbyw4EGLHs

Discussion Thread:
https://lists.apache.org/thread/9t4svsnd71j7sb4r4scf2xhh8dvp3b43

Previous vote and discussion thread:
https://lists.apache.org/thread/81xghrfwvopp274rgyxfthsstb2xmkz1

*Please vote on adopting this proposal.*

[ ] +1: Accept the proposal as an official SPIP

[ ] +0: No opinion

[ ] -1: Disapprove (please explain why)

The vote will remain open for *at least 72 hours. *

Thanks to everyone who participated in the discussion and provided valuable
feedback!


Best regards,

Haiyang

Reply via email to