This sounds useful, especially with Iceberg proposals like versioned SQL
UDF's.  On the surface it sounds like we could extend DSV2 FunctionCatalog
(which as you point out lacks dynamic create/drop function today), but I
may not know some details.  Would like to hear opinion of others too who
have worked more on functions/UDF's.

Thanks!
Szehon

On Wed, Jan 7, 2026 at 9:32 PM huaxin gao <[email protected]> wrote:

> Hi Wenchen,
>
> Great question. In the SPIP, the language runtime is carried in the
> function spec (for python / python-pandas) so catalogs can optionally
> declare constraints on the execution environment.
>
> Concretely, the spec can include optional fields like:
>
>    -
>
>    pythonVersion (e.g., "3.10")
>    -
>
>    requirements (pip-style specs)
>    -
>
>    environmentUri (optional pointer to a pre-built / admin-approved
>    environment)
>
> For the initial stage, we assume execution uses the existing PySpark
> worker environment (same as regular Python UDF / pandas UDF). If
> pythonVersion / requirements are present, Spark can validate them against
> the current worker env and fail fast (AnalysisException) if they’re not
> satisfied.
>
> environmentUri is intended as an extension point for future integration
> (or vendor plugins) to select a vetted environment, but we don’t assume
> Spark will provision environments out-of-the-box in v1.
>
> Thanks,
>
> Huaxin
>
> On Wed, Jan 7, 2026 at 6:06 PM Wenchen Fan <[email protected]> wrote:
>
>> This is a great feature! How do we define the language runtime? e.g. the
>> Python version and libraries. Do we assume the Python runtime is the same
>> as the PySpark worker?
>>
>> On Thu, Jan 8, 2026 at 3:12 AM huaxin gao <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>> I’d like to start a discussion on a draft SPIP
>>> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3>
>>> :
>>>
>>> *SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with
>>> Catalog SPI and CRUD*
>>>
>>> *Problem:* Spark can’t load SQL/Python function bodies from external
>>> catalogs in a standard way today, so users rely on session registration or
>>> vendor extensions.
>>>
>>> *Proposal:*
>>>
>>>    -
>>>
>>>    Add CodeLiteralFunctionCatalog (Java SPI) returning CodeFunctionSpec
>>>    with implementations (spark-sql, python, python-pandas).
>>>    -
>>>
>>>    Resolution:
>>>    -
>>>
>>>       SQL: parse + inline (deterministic ⇒ foldable).
>>>       -
>>>
>>>       Python/pandas: run via existing Python UDF / pandas UDF runtime
>>>       (opaque).
>>>       -
>>>
>>>       SQL TVF: parse to plan, substitute params, validate schema.
>>>       -
>>>
>>>    DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it
>>>    implements the SPI; otherwise fall back.
>>>
>>> *Precedence + defaults:*
>>>
>>>    -
>>>
>>>    Unqualified: temp/session > built-in/DSv2 > code-literal (current
>>>    catalog). Qualified names resolve only in the named catalog.
>>>    -
>>>
>>>    Defaults: feature on, SQL on, Python/pandas off; optional
>>>    languagePreference.
>>>
>>> Feedbacks are welcomed!
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>

Reply via email to