Re: [DISCUSS] SPIP: Catalog-backed code-literal functions (SQL/Python) via catalog SPI + CRUD

Szehon Ho Wed, 28 Jan 2026 11:12:03 -0800

This sounds useful, especially with Iceberg proposals like versioned SQL
UDF's.  On the surface it sounds like we could extend DSV2 FunctionCatalog
(which as you point out lacks dynamic create/drop function today), but I
may not know some details.  Would like to hear opinion of others too who
have worked more on functions/UDF's.


Thanks!
Szehon

On Wed, Jan 7, 2026 at 9:32 PM huaxin gao <[email protected]> wrote:

> Hi Wenchen,
>
> Great question. In the SPIP, the language runtime is carried in the
> function spec (for python / python-pandas) so catalogs can optionally
> declare constraints on the execution environment.
>
> Concretely, the spec can include optional fields like:
>
>    -
>
>    pythonVersion (e.g., "3.10")
>    -
>
>    requirements (pip-style specs)
>    -
>
>    environmentUri (optional pointer to a pre-built / admin-approved
>    environment)
>
> For the initial stage, we assume execution uses the existing PySpark
> worker environment (same as regular Python UDF / pandas UDF). If
> pythonVersion / requirements are present, Spark can validate them against
> the current worker env and fail fast (AnalysisException) if they’re not
> satisfied.
>
> environmentUri is intended as an extension point for future integration
> (or vendor plugins) to select a vetted environment, but we don’t assume
> Spark will provision environments out-of-the-box in v1.
>
> Thanks,
>
> Huaxin
>
> On Wed, Jan 7, 2026 at 6:06 PM Wenchen Fan <[email protected]> wrote:
>
>> This is a great feature! How do we define the language runtime? e.g. the
>> Python version and libraries. Do we assume the Python runtime is the same
>> as the PySpark worker?
>>
>> On Thu, Jan 8, 2026 at 3:12 AM huaxin gao <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>> I’d like to start a discussion on a draft SPIP
>>> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3>
>>> :
>>>
>>> *SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with
>>> Catalog SPI and CRUD*
>>>
>>> *Problem:* Spark can’t load SQL/Python function bodies from external
>>> catalogs in a standard way today, so users rely on session registration or
>>> vendor extensions.
>>>
>>> *Proposal:*
>>>
>>>    -
>>>
>>>    Add CodeLiteralFunctionCatalog (Java SPI) returning CodeFunctionSpec
>>>    with implementations (spark-sql, python, python-pandas).
>>>    -
>>>
>>>    Resolution:
>>>    -
>>>
>>>       SQL: parse + inline (deterministic ⇒ foldable).
>>>       -
>>>
>>>       Python/pandas: run via existing Python UDF / pandas UDF runtime
>>>       (opaque).
>>>       -
>>>
>>>       SQL TVF: parse to plan, substitute params, validate schema.
>>>       -
>>>
>>>    DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it
>>>    implements the SPI; otherwise fall back.
>>>
>>> *Precedence + defaults:*
>>>
>>>    -
>>>
>>>    Unqualified: temp/session > built-in/DSv2 > code-literal (current
>>>    catalog). Qualified names resolve only in the named catalog.
>>>    -
>>>
>>>    Defaults: feature on, SQL on, Python/pandas off; optional
>>>    languagePreference.
>>>
>>> Feedbacks are welcomed!
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>

Re: [DISCUSS] SPIP: Catalog-backed code-literal functions (SQL/Python) via catalog SPI + CRUD

Reply via email to