This sounds useful, especially with Iceberg proposals like versioned SQL UDF's. On the surface it sounds like we could extend DSV2 FunctionCatalog (which as you point out lacks dynamic create/drop function today), but I may not know some details. Would like to hear opinion of others too who have worked more on functions/UDF's.
Thanks! Szehon On Wed, Jan 7, 2026 at 9:32 PM huaxin gao <[email protected]> wrote: > Hi Wenchen, > > Great question. In the SPIP, the language runtime is carried in the > function spec (for python / python-pandas) so catalogs can optionally > declare constraints on the execution environment. > > Concretely, the spec can include optional fields like: > > - > > pythonVersion (e.g., "3.10") > - > > requirements (pip-style specs) > - > > environmentUri (optional pointer to a pre-built / admin-approved > environment) > > For the initial stage, we assume execution uses the existing PySpark > worker environment (same as regular Python UDF / pandas UDF). If > pythonVersion / requirements are present, Spark can validate them against > the current worker env and fail fast (AnalysisException) if they’re not > satisfied. > > environmentUri is intended as an extension point for future integration > (or vendor plugins) to select a vetted environment, but we don’t assume > Spark will provision environments out-of-the-box in v1. > > Thanks, > > Huaxin > > On Wed, Jan 7, 2026 at 6:06 PM Wenchen Fan <[email protected]> wrote: > >> This is a great feature! How do we define the language runtime? e.g. the >> Python version and libraries. Do we assume the Python runtime is the same >> as the PySpark worker? >> >> On Thu, Jan 8, 2026 at 3:12 AM huaxin gao <[email protected]> wrote: >> >>> Hi All, >>> >>> I’d like to start a discussion on a draft SPIP >>> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3> >>> : >>> >>> *SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with >>> Catalog SPI and CRUD* >>> >>> *Problem:* Spark can’t load SQL/Python function bodies from external >>> catalogs in a standard way today, so users rely on session registration or >>> vendor extensions. >>> >>> *Proposal:* >>> >>> - >>> >>> Add CodeLiteralFunctionCatalog (Java SPI) returning CodeFunctionSpec >>> with implementations (spark-sql, python, python-pandas). >>> - >>> >>> Resolution: >>> - >>> >>> SQL: parse + inline (deterministic ⇒ foldable). >>> - >>> >>> Python/pandas: run via existing Python UDF / pandas UDF runtime >>> (opaque). >>> - >>> >>> SQL TVF: parse to plan, substitute params, validate schema. >>> - >>> >>> DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it >>> implements the SPI; otherwise fall back. >>> >>> *Precedence + defaults:* >>> >>> - >>> >>> Unqualified: temp/session > built-in/DSv2 > code-literal (current >>> catalog). Qualified names resolve only in the named catalog. >>> - >>> >>> Defaults: feature on, SQL on, Python/pandas off; optional >>> languagePreference. >>> >>> Feedbacks are welcomed! >>> >>> Thanks, >>> >>> Huaxin >>> >>
