Hi Holden, Yes, that’s exactly the motivation: replace the “global function” hacks.
For requirements, I’d avoid dynamic installs in Phase 1. The initial contract is: the spec can declare pythonVersion / requirements / environmentUri, and Spark validates and fails fast if the runtime isn’t satisfied. Dynamic installs could be an opt-in follow-up. Thanks, Huaxin On Fri, Feb 13, 2026 at 12:18 PM Holden Karau <[email protected]> wrote: > I like the idea of this a lot, I’ve seen a bunch of hacks at companies to > make global functions within the company and this seems like a much better > way of doing it. > > For the requirements option, would it make sense to try and install them > dynamically? (Fail fast seems like the way to start though). > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_email> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > > On Wed, Jan 28, 2026 at 1:12 PM Szehon Ho <[email protected]> wrote: > >> This sounds useful, especially with Iceberg proposals like versioned SQL >> UDF's. On the surface it sounds like we could extend DSV2 FunctionCatalog >> (which as you point out lacks dynamic create/drop function today), but I >> may not know some details. Would like to hear opinion of others too who >> have worked more on functions/UDF's. >> >> Thanks! >> Szehon >> >> On Wed, Jan 7, 2026 at 9:32 PM huaxin gao <[email protected]> wrote: >> >>> Hi Wenchen, >>> >>> Great question. In the SPIP, the language runtime is carried in the >>> function spec (for python / python-pandas) so catalogs can optionally >>> declare constraints on the execution environment. >>> >>> Concretely, the spec can include optional fields like: >>> >>> - >>> >>> pythonVersion (e.g., "3.10") >>> - >>> >>> requirements (pip-style specs) >>> - >>> >>> environmentUri (optional pointer to a pre-built / admin-approved >>> environment) >>> >>> For the initial stage, we assume execution uses the existing PySpark >>> worker environment (same as regular Python UDF / pandas UDF). If >>> pythonVersion / requirements are present, Spark can validate them >>> against the current worker env and fail fast (AnalysisException) if they’re >>> not satisfied. >>> >>> environmentUri is intended as an extension point for future integration >>> (or vendor plugins) to select a vetted environment, but we don’t assume >>> Spark will provision environments out-of-the-box in v1. >>> >>> Thanks, >>> >>> Huaxin >>> >>> On Wed, Jan 7, 2026 at 6:06 PM Wenchen Fan <[email protected]> wrote: >>> >>>> This is a great feature! How do we define the language runtime? e.g. >>>> the Python version and libraries. Do we assume the Python runtime is the >>>> same as the PySpark worker? >>>> >>>> On Thu, Jan 8, 2026 at 3:12 AM huaxin gao <[email protected]> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I’d like to start a discussion on a draft SPIP >>>>> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3> >>>>> : >>>>> >>>>> *SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with >>>>> Catalog SPI and CRUD* >>>>> >>>>> *Problem:* Spark can’t load SQL/Python function bodies from external >>>>> catalogs in a standard way today, so users rely on session registration or >>>>> vendor extensions. >>>>> >>>>> *Proposal:* >>>>> >>>>> - >>>>> >>>>> Add CodeLiteralFunctionCatalog (Java SPI) returning >>>>> CodeFunctionSpec with implementations (spark-sql, python, >>>>> python-pandas). >>>>> - >>>>> >>>>> Resolution: >>>>> - >>>>> >>>>> SQL: parse + inline (deterministic ⇒ foldable). >>>>> - >>>>> >>>>> Python/pandas: run via existing Python UDF / pandas UDF runtime >>>>> (opaque). >>>>> - >>>>> >>>>> SQL TVF: parse to plan, substitute params, validate schema. >>>>> - >>>>> >>>>> DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it >>>>> implements the SPI; otherwise fall back. >>>>> >>>>> *Precedence + defaults:* >>>>> >>>>> - >>>>> >>>>> Unqualified: temp/session > built-in/DSv2 > code-literal (current >>>>> catalog). Qualified names resolve only in the named catalog. >>>>> - >>>>> >>>>> Defaults: feature on, SQL on, Python/pandas off; optional >>>>> languagePreference. >>>>> >>>>> Feedbacks are welcomed! >>>>> >>>>> Thanks, >>>>> >>>>> Huaxin >>>>> >>>>
