Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

Yufei Gu Fri, 17 Oct 2025 20:12:44 -0700

Hi folks,

Thanks a lot for joining today's UDF sync. Here is the summary:


   1. Instead of relying on dynamic inference, the return table’s schema
   for user-defined table functions (UDTFs) should be explicitly defined.
   2. Whether a function is a UDTF should be captured as a dedicated
   attribute, rather than being inferred indirectly from the return type.
   3. The interpretation of a UDF body (whether it is treated as a partial
   SQL expression or as a full SELECT statement) should be determined by
   engines. Example: `SELECT x +1` vs. `x + 1`. Different engines have
   different takes.
   4. User-defined aggregation functions (UDAFs) are out of scope for now.
   5. Each overload should include its own current-version field. This
   avoids relying solely on the global `definition-versions` when querying the
   current version of one overload.

You can watch the recording here:https://www.youtube.com/watch?v=9t2xev8WfAw
I will update the PR(https://github.com/apache/iceberg/pull/14117) shortly.

Yufei


On Fri, Sep 19, 2025 at 9:42 AM Yufei Gu <[email protected]> wrote:

> Hi folks,
>
> Really appreciated feedback from you all over the past few months. I've
> filed the initial PR for the UDF spec:
> https://github.com/apache/iceberg/pull/14117. It captures the consensus
> we've built and addresses the write amplification concern raised in our
> last discussion.
>
> Please take a look and share your thoughts. Happy to discuss it further
> during Monday's meeting as well.
>
> Yufei
>
>
> On Mon, Sep 8, 2025 at 6:33 PM Yufei Gu <[email protected]> wrote:
>
>> Hi folks, thanks for joining today’s UDF sync.
>>
>> We covered the UDF metadata structure, captured in this doc:
>> https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
>> .
>>
>> We also discussed a way to avoid copying every overload into the new
>> metadata JSON when creating a new version. One of ideas is to introduce a
>> global version array, this is not yet reflected in the doc, but I’ll update
>> it shortly. Other key points:
>>
>>    - The latest UDF version will typically be used in most scenarios,
>>    but engines retain the flexibility to choose which version to execute.
>>    - Keeping the version while referring to an UDF probably isn't a good
>>    idea. Users are responsible for updating downstream views if they 
>> reference
>>    older UDF versions.
>>
>> You can watch the recording here:
>> https://www.youtube.com/watch?v=6ResT-ODelI&ab_channel=ApacheIceberg
>>
>> Yufei
>>
>>
>> On Mon, Aug 25, 2025 at 6:36 PM Yufei Gu <[email protected]> wrote:
>>
>>> Hi folks, thanks for attending today’s UDF sync. In general, we
>>> discussed the UDF metadata structure, captured at this doc(
>>> https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
>>> ). Here is the detailed summary:
>>>
>>>    1. Each UDF overload has its own return type. e.g., `add(int, int)`
>>>    returns `int`, while `add(long, long)`  returns `long`
>>>    2. Return type should be explicitly specified, no implicit or
>>>    statement-based return type inference should be allowed.
>>>    3. Adding explicit properties like deterministic, doc properties at
>>>    the overload level.
>>>    4. Adding property “secure” at the top level.
>>>    5. Introducing a dedicated signature definitions section to
>>>    centralize metadata (Function parameters, Return type, Parameter
>>>    descriptions). Each overload would reference a signature definition by 
>>> ID.
>>>    This decoupling allows signature-related updates (like modifying 
>>> parameter
>>>    descriptions) without requiring a new UDF version, similar to how 
>>> updating
>>>    a table schema doesn’t create a new snapshot.
>>>    6. Whether to have versioned open properties or not. Versioned
>>>    properties can lead to unnecessary copying of a bag of properties into 
>>> each
>>>    version, while it provides a clear history of properties for any future
>>>    debugging and understanding of the UDF behavior at a specific point in
>>>    time.
>>>
>>> Watch the recording here,
>>> https://www.youtube.com/watch?v=p7CvuGZKLSo&list=PLkifVhhWtccwzc3oRWjy5XiYJl0R6kdQL
>>>
>>> Yufei
>>>
>>>
>>> On Thu, Aug 21, 2025 at 4:18 PM Yufei Gu <[email protected]> wrote:
>>>
>>>> Hi everyone, here’s the summary from our last sync on 8/11. Apologies
>>>> for the delay!
>>>>
>>>>    - One UDF entity for all overloads
>>>>       - We agreed to combine overloads with the same name into a
>>>>       single UDF entity, which shares a common metadata.json file.
>>>>       - Listing UDFs will return a list of UDF names, not a list of
>>>>       individual signatures.
>>>>       - Loading a UDF by name will return all of its overloads.
>>>>    - Versioning Strategy
>>>>       - A global version number will track changes across the entire
>>>>       UDF entity, it increments monolithically.
>>>>       - Each overload will also maintain its own version (e.g.,
>>>>       updated_at_version) to trace changes specific to that overload.
>>>>    - For simplicity, the load API will not support argument-based
>>>>    filtering in the initial release. It will always return all overloads 
>>>> for a
>>>>    given UDF name, overload-level loading is not supported at this stage.
>>>>
>>>> Watch the recording here,
>>>> https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <[email protected]> wrote:
>>>>
>>>>> To recap and add my thoughts, we want to support UDFs with multiple
>>>>> signatures under the same name, which can serve both overload-aware and
>>>>> overload-naive engines.
>>>>>
>>>>> Per my investigation[1], most engines support overloading by arguments
>>>>> and allow implicit conversions like numeric widening (e.g., INT →
>>>>> BIGINT/FLOAT). The resolution approach causes issues like silent behavior
>>>>> change. Here is an example:
>>>>>
>>>>>    - Initially, only foo(DOUBLE) exists.
>>>>>    - foo(42::INT) widens INT → DOUBLE and runs expected code.
>>>>>    - Later: malicious user creates foo(BIGINT).
>>>>>    - Engine’s best-match resolution now binds the same call to the
>>>>>    new overload, changing behavior without modifying the query.
>>>>>
>>>>> To mitigate this issue, we have to choose between these two access
>>>>> control models:
>>>>>
>>>>>    1. Model A – Name-Level ACL: Grants apply to all overloads of a
>>>>>    function name.
>>>>>    2. Model B – Signature-Level ACL: Grants tied to specific
>>>>>    signatures.
>>>>>
>>>>> The general recommendation is to adopt *Model A.* It trades some
>>>>> precision for safety and simplicity, while eliminating the silent behavior
>>>>> change problem. More details are in this doc[1].
>>>>>
>>>>> 1.
>>>>> https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0
>>>>>
>>>>> Yufei
>>>>>
>>>>>
>>>>> On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks to everyone who joined the sync.
>>>>>> Here is the meeting recording:
>>>>>> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view
>>>>>>
>>>>>> Summary:
>>>>>> We have discussed how to define function identifiers (should also
>>>>>> handle function overloading). Ryan suggested that we should check how 
>>>>>> Spark
>>>>>> does it. We can refer to functions using an identifier and then bind the
>>>>>> different signatures to it. So that access policies can be applied per
>>>>>> identifier. This is also linked to how we want to version the functions
>>>>>> when overloading is supported.
>>>>>>
>>>>>> I will check more about this and update the proposal doc.
>>>>>>
>>>>>> Please check/subscribe to the dev events calendar for the next
>>>>>> meeting link (Aug 11).
>>>>>>
>>>>>> - Ajantha
>>>>>>
>>>>>> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Ajantha,
>>>>>>>
>>>>>>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events"
>>>>>>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but
>>>>>>> i'll be at this one.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> No one joined the sync today. I came to know that Yufei is on
>>>>>>>> holiday, and Ryan and others couldn't make it, similar to the last 
>>>>>>>> sync. It
>>>>>>>> seems Yufei might have forgotten to transfer meeting ownership as 
>>>>>>>> well, as
>>>>>>>> new members needed admin approval and couldn't join automatically this
>>>>>>>> week. Also, I can understand it is summer holiday season for many.
>>>>>>>>
>>>>>>>> I've updated the function signature schema and other open points. I
>>>>>>>> believe we're very close to the final version of the spec. A meeting is
>>>>>>>> indeed necessary to finalize this, but we don't have to wait for it to
>>>>>>>> finish the review process. We had many meetings on this in the past
>>>>>>>> already. So, please review the document at your earliest convenience. 
>>>>>>>> If we
>>>>>>>> agree on the spec by next week, I can raise a PR.
>>>>>>>>
>>>>>>>> - Ajantha
>>>>>>>>
>>>>>>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I’d propose to move the field `properties` from a top level field
>>>>>>>>> to a field inside “version” along with a representation, so that 
>>>>>>>>> properties
>>>>>>>>> are versioned. A property like “deterministic” could change along with
>>>>>>>>> representation over time. For example, we need to change 
>>>>>>>>> “deterministic”
>>>>>>>>> from true to false in case of adding a non-deterministic SQL
>>>>>>>>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback 
>>>>>>>>> won't
>>>>>>>>> be safe.
>>>>>>>>>
>>>>>>>>> That said, it's still an open question whether we need any
>>>>>>>>> non-versioned properties. We can introduce them later if a use case 
>>>>>>>>> arises.
>>>>>>>>>
>>>>>>>>> Yufei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>
>>>>>>>>>> I’d prefer to keep the signature list separate from the
>>>>>>>>>> representation history. Here are reasons:
>>>>>>>>>>
>>>>>>>>>>    1. Each version still enforces a single signature. Although
>>>>>>>>>>    the signatures array is global to the UDF, each version 
>>>>>>>>>> references just one
>>>>>>>>>>    signature ID. Rollbacks to historical versions remain safe.
>>>>>>>>>>    2. We’ve separated the less frequently changing component
>>>>>>>>>>    (signatures) from the more dynamic one (representations) to 
>>>>>>>>>> reduce metadata
>>>>>>>>>>    file size.
>>>>>>>>>>    3. Since signatures use Iceberg data types, they should
>>>>>>>>>>    remain unaffected by multi-dialect representation differences.
>>>>>>>>>>
>>>>>>>>>> Yufei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>> Summary:
>>>>>>>>>>> We have discussed the action items from the last sync (*see
>>>>>>>>>>> Appendix C* in the proposal doc)
>>>>>>>>>>>
>>>>>>>>>>>    - Function overloading: Supported by few of the engines and
>>>>>>>>>>>    in the roadmaps of many engines. Iceberg will support it. We 
>>>>>>>>>>> will maintain
>>>>>>>>>>>    the `FunctionIdentifier` (extends `TableIdentifer` but also have 
>>>>>>>>>>> a member
>>>>>>>>>>>    containing the function argument's type list). And all 
>>>>>>>>>>> operations like
>>>>>>>>>>>    load, rename, list, create and drop are based on 
>>>>>>>>>>> `FunctionIdentifier`.
>>>>>>>>>>>    - Secure UDF: If we store it as a property in a bag, we need
>>>>>>>>>>>    to standardize the property name. Iceberg encryption may be 
>>>>>>>>>>> orthogonal to
>>>>>>>>>>>    this discussion.
>>>>>>>>>>>    - UDF with multi statement and procedural bodies are
>>>>>>>>>>>    supported by some engines. Iceberg will support it. Store the 
>>>>>>>>>>> body as it is
>>>>>>>>>>>    while creating function by the engine.
>>>>>>>>>>>
>>>>>>>>>>> new discussions around
>>>>>>>>>>>
>>>>>>>>>>>    - Standardizing the property names (deterministic, secure).
>>>>>>>>>>>    - About the rename function.
>>>>>>>>>>>    - Replace function. To check upto what level replace is
>>>>>>>>>>>    supported (considering function overloading) .
>>>>>>>>>>>    - Signature should be associated with representation?
>>>>>>>>>>>
>>>>>>>>>>>    I think we are close on the spec. Please review the proposal
>>>>>>>>>>>    
>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>
>>>>>>>>>>>    .
>>>>>>>>>>>
>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>
>>>>>>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>
>>>>>>>>>>> - Ajantha
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Can it be handled by Iceberg encryption? If the whole metadata
>>>>>>>>>>>> is encrypted, we don't have to worry about just hiding the UDF 
>>>>>>>>>>>> body? Let us
>>>>>>>>>>>> discuss more on the sync today.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, hiding the definition and disabling pushdown are
>>>>>>>>>>>>> required.We will need a named key(e.g., secure) somewhere, no 
>>>>>>>>>>>>> matter if it
>>>>>>>>>>>>> is a top level property or a key as a part of the UDF properties. 
>>>>>>>>>>>>> So that
>>>>>>>>>>>>> both UDF creator and consumer can recognize it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the extra detail. What do you think the spec would
>>>>>>>>>>>>>> require? Would it require hiding the UDF definition from users 
>>>>>>>>>>>>>> and require
>>>>>>>>>>>>>> specific pushdown cases be disabled? The use cases seem valid, 
>>>>>>>>>>>>>> but I'm
>>>>>>>>>>>>>> trying to understand the requirements this places on engines and 
>>>>>>>>>>>>>> why it
>>>>>>>>>>>>>> needs to be part of the spec, rather than part of the properties 
>>>>>>>>>>>>>> of the UDF.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here are the main use cases for secure UDFs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Hiding UDF Definitions: This includes concealing the UDF
>>>>>>>>>>>>>>>    body and details like the list of imports, some of them 
>>>>>>>>>>>>>>> aren’t applicable
>>>>>>>>>>>>>>>    to SQL UDFs.
>>>>>>>>>>>>>>>    2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Sandboxed Execution: Ensuring the UDF runs in an
>>>>>>>>>>>>>>>    isolated environment. Again, this typically doesn’t apply to 
>>>>>>>>>>>>>>> SQL UDFs.
>>>>>>>>>>>>>>>    3.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Preventing Data Leakage at Execution Time: For example,
>>>>>>>>>>>>>>>    secure UDFs may disable certain optimizations—such as 
>>>>>>>>>>>>>>> predicate pushdown—to
>>>>>>>>>>>>>>>    avoid exposing sensitive data indirectly. [1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Given these scenarios, I agree with your point that the
>>>>>>>>>>>>>>> secure flag is primarily an instruction to the engine to
>>>>>>>>>>>>>>> behave differently. While it's largely an engine-side behavior, 
>>>>>>>>>>>>>>> we still
>>>>>>>>>>>>>>> need to include this flag in the UDF definition to indicate 
>>>>>>>>>>>>>>> whether a UDF
>>>>>>>>>>>>>>> is secure, especially considering the perf penalty introduced 
>>>>>>>>>>>>>>> by scenario
>>>>>>>>>>>>>>> #3. We should clearly recommend that users avoid marking UDFs 
>>>>>>>>>>>>>>> as secure
>>>>>>>>>>>>>>> unless it's truly necessary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown
>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <[email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yufei, could you make the argument for supporting a
>>>>>>>>>>>>>>>> "secure" UDF? What use case are you addressing and what 
>>>>>>>>>>>>>>>> specifically
>>>>>>>>>>>>>>>> changes about how the UDF is handled? If the idea is to hide 
>>>>>>>>>>>>>>>> the UDF
>>>>>>>>>>>>>>>> definition, do we need to include it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this would be a signal to a "trusted engine". When
>>>>>>>>>>>>>>>> the engine interacts with the catalog it sends authorization 
>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>> about itself in addition to the user that it is acting on 
>>>>>>>>>>>>>>>> behalf of. That
>>>>>>>>>>>>>>>> way the catalog knows that the secure UDF can be sent to the 
>>>>>>>>>>>>>>>> engine and
>>>>>>>>>>>>>>>> won't be shown to the user. The majority of this logic is on 
>>>>>>>>>>>>>>>> the REST
>>>>>>>>>>>>>>>> server side, and the only part that is communicated to the 
>>>>>>>>>>>>>>>> client is the
>>>>>>>>>>>>>>>> request not to show the UDF to the user, right? In that case 
>>>>>>>>>>>>>>>> should this be
>>>>>>>>>>>>>>>> a property rather than part of the definition? Even if we 
>>>>>>>>>>>>>>>> state that the
>>>>>>>>>>>>>>>> client "must" suppress the UDF definition, it's really just a 
>>>>>>>>>>>>>>>> request. Only
>>>>>>>>>>>>>>>> trusted engines can be passed the UDF definition, so a spec 
>>>>>>>>>>>>>>>> requirement to
>>>>>>>>>>>>>>>> suppress the definition isn't very meaningful.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether
>>>>>>>>>>>>>>>>> those statements run within a single transaction should be 
>>>>>>>>>>>>>>>>> treated as an
>>>>>>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the 
>>>>>>>>>>>>>>>>> expectation,
>>>>>>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even 
>>>>>>>>>>>>>>>>> if a UDF
>>>>>>>>>>>>>>>>> declares itself transactional, the engine may or may not 
>>>>>>>>>>>>>>>>> enforce it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF”
>>>>>>>>>>>>>>>>> option supported by some engines[1], so the body and any 
>>>>>>>>>>>>>>>>> sensitive details
>>>>>>>>>>>>>>>>> stay hidden from callers?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing
>>>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - We have gone through the SQL UDF syntax supported
>>>>>>>>>>>>>>>>>>    by different engines (Snowflake, databricks, Dremio, 
>>>>>>>>>>>>>>>>>> Trino, OSS spark 4.0).
>>>>>>>>>>>>>>>>>>    - Each engine uses its own block separator, like $$
>>>>>>>>>>>>>>>>>>    or '' or none. Action item was to check whether engines 
>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>    multi-statement (transactional) UDF bodies.
>>>>>>>>>>>>>>>>>>    - Discussed about function overloading. Need to check
>>>>>>>>>>>>>>>>>>    whether these engines support function overloading for 
>>>>>>>>>>>>>>>>>> SQL UDFs. Postgres
>>>>>>>>>>>>>>>>>>    supports it! If yes, need to adopt the spec to handle it.
>>>>>>>>>>>>>>>>>>    - Started online spec review and discussed the
>>>>>>>>>>>>>>>>>>    deterministic flag and concluded that we keep the 
>>>>>>>>>>>>>>>>>> independent fields (like
>>>>>>>>>>>>>>>>>>    deterministic) in spec only if the majority of engines 
>>>>>>>>>>>>>>>>>> supports it. Else it
>>>>>>>>>>>>>>>>>>    will be passed in a property bag (engine specific). And 
>>>>>>>>>>>>>>>>>> it is the engine's
>>>>>>>>>>>>>>>>>>    responsibility to honor those optional properties.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Feel free to review the current proposal document here
>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is
>>>>>>>>>>>>>>>>>> ready.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    We discussed including Python support; the majority
>>>>>>>>>>>>>>>>>>>    agreed *not to* (see recording for details).
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    No strong opposition to versioning — it will be
>>>>>>>>>>>>>>>>>>>    included to support change tracking and similar use 
>>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Suggestions were made to document how each catalog
>>>>>>>>>>>>>>>>>>>    resolves UDFs, similar to views and tables.
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    We agreed not to deviate from the existing
>>>>>>>>>>>>>>>>>>>    table/view spec — e.g., location will remain
>>>>>>>>>>>>>>>>>>>    *required* for cross-catalog compatibility.
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    We also discussed a bit about view interoperability
>>>>>>>>>>>>>>>>>>>    as the same things are applicable here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Feel free to review the proposal document
>>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
>>>>>>>>>>>>>>>>>>>  here.
>>>>>>>>>>>>>>>>>>>    With the current scope, it is similar to the view/table 
>>>>>>>>>>>>>>>>>>> spec now.
>>>>>>>>>>>>>>>>>>>    Final spec will be put to review and vote once it is
>>>>>>>>>>>>>>>>>>>    ready.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for
>>>>>>>>>>>>>>>>>>>> the UDF project. Everyone’s welcome to drop in and share 
>>>>>>>>>>>>>>>>>>>> ideas! Here is the
>>>>>>>>>>>>>>>>>>>> meeting link:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Iceberg UDF sync
>>>>>>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am
>>>>>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Update on the progress.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to
>>>>>>>>>>>>>>>>>>>>> discuss the UDF proposal. We covered several key points, 
>>>>>>>>>>>>>>>>>>>>> though some are
>>>>>>>>>>>>>>>>>>>>> still open for further discussion:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for
>>>>>>>>>>>>>>>>>>>>> UDFs at this stage? We explored the possibility of 
>>>>>>>>>>>>>>>>>>>>> simplifying the
>>>>>>>>>>>>>>>>>>>>> specification by avoiding view replication, and 
>>>>>>>>>>>>>>>>>>>>> potentially introducing
>>>>>>>>>>>>>>>>>>>>> versioning support later. UDTFs, being a superset of 
>>>>>>>>>>>>>>>>>>>>> views in some ways,
>>>>>>>>>>>>>>>>>>>>> may not require versioning initially.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may
>>>>>>>>>>>>>>>>>>>>> not support vararg syntax in CREATE FUNCTION, Iceberg
>>>>>>>>>>>>>>>>>>>>> UDFs could represent such arguments as lists when 
>>>>>>>>>>>>>>>>>>>>> supported by the engine.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently
>>>>>>>>>>>>>>>>>>>>> doesn’t support generic types (e.g., object), we can
>>>>>>>>>>>>>>>>>>>>> only map engine-specific types to Iceberg types. As a 
>>>>>>>>>>>>>>>>>>>>> result, generic data
>>>>>>>>>>>>>>>>>>>>> types will not be supported in the initial version.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a
>>>>>>>>>>>>>>>>>>>>> language for SQL UDFs seems promising, especially given 
>>>>>>>>>>>>>>>>>>>>> its potential to
>>>>>>>>>>>>>>>>>>>>> resolve interoperability challenges. Some engines, 
>>>>>>>>>>>>>>>>>>>>> however, require
>>>>>>>>>>>>>>>>>>>>> platform version and package dependency details to 
>>>>>>>>>>>>>>>>>>>>> execute Python code—this
>>>>>>>>>>>>>>>>>>>>> should be captured in the specification.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Next Steps*
>>>>>>>>>>>>>>>>>>>>> I will update the proposal document with two primary
>>>>>>>>>>>>>>>>>>>>> UDF use cases:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    Policy exchange between engines
>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    UDTF as a superset of view functionality
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The update will include corresponding syntax examples
>>>>>>>>>>>>>>>>>>>>> in both SQL and Python, and detail how each use case is 
>>>>>>>>>>>>>>>>>>>>> represented in
>>>>>>>>>>>>>>>>>>>>> Iceberg metadata.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more
>>>>>>>>>>>>>>>>>>>>> interested participants) to continue refining and 
>>>>>>>>>>>>>>>>>>>>> finalizing the UDF
>>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the
>>>>>>>>>>>>>>>>>>>>>> previous comments. Additionally, I've included the SQL 
>>>>>>>>>>>>>>>>>>>>>> UDF syntax supported
>>>>>>>>>>>>>>>>>>>>>> by various vendors, including Dremio, Snowflake, 
>>>>>>>>>>>>>>>>>>>>>> Databricks, and Trino.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper
>>>>>>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, 
>>>>>>>>>>>>>>>>>>>>>> especially with the
>>>>>>>>>>>>>>>>>>>>>> renewed interest from the community.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was
>>>>>>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and 
>>>>>>>>>>>>>>>>>>>>>>> adding endpoints for
>>>>>>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better
>>>>>>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for 
>>>>>>>>>>>>>>>>>>>>>>> fine-grained access
>>>>>>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond 
>>>>>>>>>>>>>>>>>>>>>>> just versioned and
>>>>>>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many 
>>>>>>>>>>>>>>>>>>>>>>> vendors are interested
>>>>>>>>>>>>>>>>>>>>>>> in this feature.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d
>>>>>>>>>>>>>>>>>>>>>>> like to take ownership of this effort and revive the 
>>>>>>>>>>>>>>>>>>>>>>> work. I'll be
>>>>>>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and will 
>>>>>>>>>>>>>>>>>>>>>>> share an updated
>>>>>>>>>>>>>>>>>>>>>>> proposal by next week.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg!
>>>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be
>>>>>>>>>>>>>>>>>>>>>>>> SQL. It merely does not specify (in this revision) how 
>>>>>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>> representations are to be written.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new
>>>>>>>>>>>>>>>>>>>>>>>> type in the "Representations" section).
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the
>>>>>>>>>>>>>>>>>>>>>>>>> spec. It leaves a way for future versions to add 
>>>>>>>>>>>>>>>>>>>>>>>>> different representations
>>>>>>>>>>>>>>>>>>>>>>>>> later, but only SQL is supported. That was also the 
>>>>>>>>>>>>>>>>>>>>>>>>> feedback to my initial
>>>>>>>>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri
>>>>>>>>>>>>>>>>>>>>>>>>> Bourlatchkov <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only
>>>>>>>>>>>>>>>>>>>>>>>>>> SQL representations, although it is certainly 
>>>>>>>>>>>>>>>>>>>>>>>>>> faviouring SQL in examples...
>>>>>>>>>>>>>>>>>>>>>>>>>> It would be nice to add a non-SQL example, indeed.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this
>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal focuses on SQL-based engines, while 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Python-based systems often
>>>>>>>>>>>>>>>>>>>>>>>>>>> work with data frames. Adding imperative languages 
>>>>>>>>>>>>>>>>>>>>>>>>>>> like Python would make
>>>>>>>>>>>>>>>>>>>>>>>>>>> this proposal more inclusive.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>> Findeisen <[email protected]>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before  in this thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] i read
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to share among different engines."
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully
>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design addresses 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> shareability between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines though.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functions shareable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative code?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The Iceberg UDFs are an interesting idea!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look like in e..g Spark or Trino?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Best
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> added comments. I think it would be helpful to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also have a design doc that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> covers the choices from the draft spec. For 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, the choice to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enumerate all possible function input struts 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than allowing generics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and varargs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function signatures is limiting. It would be nice 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to see a discussion of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the trade-offs and a rationale for the choice. I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think it would also be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> very helpful to have a few representative use 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases for this included in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the doc. That way the proposal can demonstrate 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that it solves those use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases with reasonable trade-offs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example, using string IDs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It requires readers to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the spec from adding
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg specs only require
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a version change to introduce 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward-incompatible changes and I think that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple function signatures per verison, but it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is unclear how to encode
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them because a version is associated with a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> single function signature.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating functions across engines, so this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn’t show that the metadata
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed is sufficient for cross-engine use cases.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shows a SELECT statement and it isn’t clear how 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this is distinct from a view
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> review comments, I will raise a PR for spec 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition next week.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a look at the proposal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be interested. Resharing since I do not think it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> was directly linked in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> didn't get any review on the proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (from Benny).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here are some examples of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data flow/data 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipeline API instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible in the very long run :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ye <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> long term opportunities in this case. Consider 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you register a Spark temp
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> view as some sort of data frame read, then it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could still be resolved to a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spark plan that is representable by an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation. But I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree this gets very complicated very soon, and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just having the case (1)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> covered would already be a huge step forward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Benny Chow <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tabular SQL UDF can be used to build a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parameterized view.  So, there's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> definitely a lot in common between UDFs and views.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Walaa Eldin Moustafa <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about what is perceived as a "UDF". There are 2 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flavors:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user whose definition is a composition of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other built-in functions/SQL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I think 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> those have more analogy to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is not 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> practical to maintain by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around (1), and may be worth
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> evaluating.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> very difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> models without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports SQL representations of UDFs (similar to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views as shared by the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reference links above), the complexity involved 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be similar to managing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for your input.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> draft spec (inspired by the view spec) this week 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to facilitate further
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jack Ye <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have a common set of functions across engines, I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't see how that is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> practical when those engines are implemented so 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code -- and especially custom user-supplied code 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- seems inherently
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specialized to me and should be part of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines' design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views? I feel we can say exactly the same thing 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for Iceberg views, but yet
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have Iceberg multi-dialect views implemented. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it sounds like we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are trying to draw a line between SQL vs other 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programming language as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code, and we are already
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different code 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dialects to an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Coral, Substrait), which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of representation 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of Iceberg view. I think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support is a good idea, even just a multi-dialect 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one like view, and that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can allow engines to for example parse a view 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL, and when a function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> referenced cannot be resolved, try to seek for a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multi-dialect UDF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> definition.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have the actual proposal published.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Robert Stupp <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same performance concerns
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply to views as well.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base upon which engines can build, so the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> argument that UDFs aren't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> practical, because engines are different, is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably only a temporary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concern.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should also try to tackle the idea to make views 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> portable, which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> conceptually not that much different from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> portable UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> negative touch to the idea of having UDFs in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, especially not in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this early stage.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's a good idea to add UDFs tracked by Iceberg 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> catalogs. I think that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg primarily deals with things that are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> centralized, like tables of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data. While it would be great to have a common 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set of functions across
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines, I don't see how that is practical when 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> those engines are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implemented so differently. Plugging in code -- 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and especially custom
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied code -- seems inherently 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specialized to me and should be part
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the engines' design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you post the proposal, but I think this would be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a very difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> models without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the community interest in storing the Versioned 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL UDFs in Iceberg.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg (inspired by view spec).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similarly to views in that they are associated 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with tables, but they can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accept arguments and produce return values, or 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> even function as inline
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dremio, Trino, Snowflake, Databricks Spark 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports SQL UDFs at catalog
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> level [1].
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can enable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs written by other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this feature into Iceberg would be a valuable 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition, and we're eager to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> drafting a specification to propose to the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> community.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>> Databricks
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

Reply via email to