Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

Yufei Gu Sun, 02 Nov 2025 13:49:20 -0800

Hi folks,

Thanks a lot for the review! Made a few changes to the spec:
1. Renamed "overload" to "definition"
2. Changed definition id from long to string with the format of parameter
type tuple.
3. Explicitly claimed that named argument invocation is supported
4. Added a nullable flag for return values
5. Added a few rules for inputs and returns


Please find more details in the PR(
https://github.com/apache/iceberg/pull/14117). Here is a doc comparing
engine behaviors in terms of named argument invocation, nullable return
value, default values for inputs, null handling:
https://docs.google.com/document/d/1GC896Z4gxYP0Vz-ENqZ3tZZBqXEUQf4qDJO11NRo8F4/edit?usp=sharing.
This can help for decision making.

We could discuss the following items in Monday's meeting:
1. Whether we need to push down parameters to representation given we want
to support named argument invocation, it requires extra mapping on the
engine side.
2. Whether we support default values.  Spark plans to support it, Snowflake
supports it already.


Yufei


On Mon, Oct 20, 2025 at 4:15 PM Yufei Gu <[email protected]> wrote:

> Hi folks,
>
> Thanks for joining today’s community UDF sync. Here’s a summary of what we
> discussed and agreed upon:
>
>    1. We agreed to use Iceberg types for each field in the UDF spec. This
>    approach ensures consistency with the table spec and allows for
>    straightforward conversion to JSON types.
>    2. Using numeric overload IDs can have side effects, for instance,
>    deleting an overload and later creating a new one with the same signature
>    will use a different numeric id, which points to the same overload. The
>    consensus is to represent the overload ID as a string derived from
>    parameter types and their order.
>    3. Parameter names must remain consistent across all overload
>    versions. We agreed to add explicit rules:
>       1. Parameter names must match across all overload versions.
>       2. Renaming a parameter requires removing all previous versions
>       that use the old name.
>    4. We discussed whether parameters or return types can be nullable and
>    how engines should handle null values. We’ll research SQL standards and
>    common engine behaviors before finalizing. The current leaning is:
>       1. Allow all parameters to be nullable.
>       2. Allow the return type to include an explicit nullability
>       annotation.
>
> You can watch the recording here for more details:
> https://youtu.be/IUd-SN0CQRs
> Will update the PR(https://github.com/apache/iceberg/pull/14117) soon.
>
> Yufei
>
>
> On Tue, Oct 7, 2025 at 9:25 AM Yufei Gu <[email protected]> wrote:
>
>> Hi everyone,
>>
>> Thanks a lot for joining Monday's UDF sync. Here is the summary:
>>
>>    1. Replace “definitions version” with “definitions logs”, which
>>    record overload changes and timestamps for informational purposes only.
>>    2. Use a globally increasing numeric ID for overloads instead of
>>    UUIDs.
>>    3. Add explicit types to all fields in the spec to clarify the
>>    metadata structure.
>>    4. Add null-handling optimization hints to help engines skip
>>    evaluation when inputs are null. (Thanks, Talat!)
>>    5. Function parameters remain ordered, but we don’t need to introduce
>>    IDs for each parameter. We’ll double-check the Java implementation to
>>    ensure consistent behavior across engines, especially for nested structs.
>>
>> You can watch the recording here for more details:
>> https://youtu.be/gJnueQODWvs
>> I have updated the PR (https://github.com/apache/iceberg/pull/14117) for
>> the above feedback. Please take a look and let me know if you have any
>> questions.
>>
>>
>> Yufei
>>
>>
>> On Mon, Sep 22, 2025 at 2:10 PM Yufei Gu <[email protected]> wrote:
>>
>>> Hi folks,
>>>
>>> Thanks a lot for joining today's UDF sync. Here is the summary:
>>>
>>>    1. Instead of relying on dynamic inference, the return table’s
>>>    schema for user-defined table functions (UDTFs) should be explicitly
>>>    defined.
>>>    2. Whether a function is a UDTF should be captured as a dedicated
>>>    attribute, rather than being inferred indirectly from the return type.
>>>    3. The interpretation of a UDF body (whether it is treated as a
>>>    partial SQL expression or as a full SELECT statement) should be 
>>> determined
>>>    by engines. Example: `SELECT x +1` vs. `x + 1`. Different engines have
>>>    different takes.
>>>    4. User-defined aggregation functions (UDAFs) are out of scope for
>>>    now.
>>>    5. Each overload should include its own current-version field. This
>>>    avoids relying solely on the global `definition-versions` when querying 
>>> the
>>>    current version of one overload.
>>>
>>> You can watch the recording here:
>>> https://www.youtube.com/watch?v=9t2xev8WfAw
>>> I will update the PR(https://github.com/apache/iceberg/pull/14117)
>>> shortly.
>>>
>>> Yufei
>>>
>>>
>>> On Fri, Sep 19, 2025 at 9:42 AM Yufei Gu <[email protected]> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> Really appreciated feedback from you all over the past few months. I've
>>>> filed the initial PR for the UDF spec:
>>>> https://github.com/apache/iceberg/pull/14117. It captures the
>>>> consensus we've built and addresses the write amplification concern raised
>>>> in our last discussion.
>>>>
>>>> Please take a look and share your thoughts. Happy to discuss it further
>>>> during Monday's meeting as well.
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Mon, Sep 8, 2025 at 6:33 PM Yufei Gu <[email protected]> wrote:
>>>>
>>>>> Hi folks, thanks for joining today’s UDF sync.
>>>>>
>>>>> We covered the UDF metadata structure, captured in this doc:
>>>>> https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
>>>>> .
>>>>>
>>>>> We also discussed a way to avoid copying every overload into the new
>>>>> metadata JSON when creating a new version. One of ideas is to introduce a
>>>>> global version array, this is not yet reflected in the doc, but I’ll 
>>>>> update
>>>>> it shortly. Other key points:
>>>>>
>>>>>    - The latest UDF version will typically be used in most scenarios,
>>>>>    but engines retain the flexibility to choose which version to execute.
>>>>>    - Keeping the version while referring to an UDF probably isn't a
>>>>>    good idea. Users are responsible for updating downstream views if they
>>>>>    reference older UDF versions.
>>>>>
>>>>> You can watch the recording here:
>>>>> https://www.youtube.com/watch?v=6ResT-ODelI&ab_channel=ApacheIceberg
>>>>>
>>>>> Yufei
>>>>>
>>>>>
>>>>> On Mon, Aug 25, 2025 at 6:36 PM Yufei Gu <[email protected]> wrote:
>>>>>
>>>>>> Hi folks, thanks for attending today’s UDF sync. In general, we
>>>>>> discussed the UDF metadata structure, captured at this doc(
>>>>>> https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
>>>>>> ). Here is the detailed summary:
>>>>>>
>>>>>>    1. Each UDF overload has its own return type. e.g., `add(int,
>>>>>>    int)` returns `int`, while `add(long, long)`  returns `long`
>>>>>>    2. Return type should be explicitly specified, no implicit or
>>>>>>    statement-based return type inference should be allowed.
>>>>>>    3. Adding explicit properties like deterministic, doc properties
>>>>>>    at the overload level.
>>>>>>    4. Adding property “secure” at the top level.
>>>>>>    5. Introducing a dedicated signature definitions section to
>>>>>>    centralize metadata (Function parameters, Return type, Parameter
>>>>>>    descriptions). Each overload would reference a signature definition 
>>>>>> by ID.
>>>>>>    This decoupling allows signature-related updates (like modifying 
>>>>>> parameter
>>>>>>    descriptions) without requiring a new UDF version, similar to how 
>>>>>> updating
>>>>>>    a table schema doesn’t create a new snapshot.
>>>>>>    6. Whether to have versioned open properties or not. Versioned
>>>>>>    properties can lead to unnecessary copying of a bag of properties 
>>>>>> into each
>>>>>>    version, while it provides a clear history of properties for any 
>>>>>> future
>>>>>>    debugging and understanding of the UDF behavior at a specific point in
>>>>>>    time.
>>>>>>
>>>>>> Watch the recording here,
>>>>>> https://www.youtube.com/watch?v=p7CvuGZKLSo&list=PLkifVhhWtccwzc3oRWjy5XiYJl0R6kdQL
>>>>>>
>>>>>> Yufei
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 21, 2025 at 4:18 PM Yufei Gu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone, here’s the summary from our last sync on 8/11.
>>>>>>> Apologies for the delay!
>>>>>>>
>>>>>>>    - One UDF entity for all overloads
>>>>>>>       - We agreed to combine overloads with the same name into a
>>>>>>>       single UDF entity, which shares a common metadata.json file.
>>>>>>>       - Listing UDFs will return a list of UDF names, not a list of
>>>>>>>       individual signatures.
>>>>>>>       - Loading a UDF by name will return all of its overloads.
>>>>>>>    - Versioning Strategy
>>>>>>>       - A global version number will track changes across the
>>>>>>>       entire UDF entity, it increments monolithically.
>>>>>>>       - Each overload will also maintain its own version (e.g.,
>>>>>>>       updated_at_version) to trace changes specific to that overload.
>>>>>>>    - For simplicity, the load API will not support argument-based
>>>>>>>    filtering in the initial release. It will always return all 
>>>>>>> overloads for a
>>>>>>>    given UDF name, overload-level loading is not supported at this 
>>>>>>> stage.
>>>>>>>
>>>>>>> Watch the recording here,
>>>>>>> https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view
>>>>>>>
>>>>>>> Yufei
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> To recap and add my thoughts, we want to support UDFs with multiple
>>>>>>>> signatures under the same name, which can serve both overload-aware and
>>>>>>>> overload-naive engines.
>>>>>>>>
>>>>>>>> Per my investigation[1], most engines support overloading by
>>>>>>>> arguments and allow implicit conversions like numeric widening (e.g., 
>>>>>>>> INT →
>>>>>>>> BIGINT/FLOAT). The resolution approach causes issues like silent 
>>>>>>>> behavior
>>>>>>>> change. Here is an example:
>>>>>>>>
>>>>>>>>    - Initially, only foo(DOUBLE) exists.
>>>>>>>>    - foo(42::INT) widens INT → DOUBLE and runs expected code.
>>>>>>>>    - Later: malicious user creates foo(BIGINT).
>>>>>>>>    - Engine’s best-match resolution now binds the same call to the
>>>>>>>>    new overload, changing behavior without modifying the query.
>>>>>>>>
>>>>>>>> To mitigate this issue, we have to choose between these two access
>>>>>>>> control models:
>>>>>>>>
>>>>>>>>    1. Model A – Name-Level ACL: Grants apply to all overloads of a
>>>>>>>>    function name.
>>>>>>>>    2. Model B – Signature-Level ACL: Grants tied to specific
>>>>>>>>    signatures.
>>>>>>>>
>>>>>>>> The general recommendation is to adopt *Model A.* It trades some
>>>>>>>> precision for safety and simplicity, while eliminating the silent 
>>>>>>>> behavior
>>>>>>>> change problem. More details are in this doc[1].
>>>>>>>>
>>>>>>>> 1.
>>>>>>>> https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0
>>>>>>>>
>>>>>>>> Yufei
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>> Here is the meeting recording:
>>>>>>>>> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view
>>>>>>>>>
>>>>>>>>> Summary:
>>>>>>>>> We have discussed how to define function identifiers (should also
>>>>>>>>> handle function overloading). Ryan suggested that we should check how 
>>>>>>>>> Spark
>>>>>>>>> does it. We can refer to functions using an identifier and then bind 
>>>>>>>>> the
>>>>>>>>> different signatures to it. So that access policies can be applied per
>>>>>>>>> identifier. This is also linked to how we want to version the 
>>>>>>>>> functions
>>>>>>>>> when overloading is supported.
>>>>>>>>>
>>>>>>>>> I will check more about this and update the proposal doc.
>>>>>>>>>
>>>>>>>>> Please check/subscribe to the dev events calendar for the next
>>>>>>>>> meeting link (Aug 11).
>>>>>>>>>
>>>>>>>>> - Ajantha
>>>>>>>>>
>>>>>>>>> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ajantha,
>>>>>>>>>>
>>>>>>>>>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events"
>>>>>>>>>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but
>>>>>>>>>> i'll be at this one.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kevin Liu
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>
>>>>>>>>>>> No one joined the sync today. I came to know that Yufei is on
>>>>>>>>>>> holiday, and Ryan and others couldn't make it, similar to the last 
>>>>>>>>>>> sync. It
>>>>>>>>>>> seems Yufei might have forgotten to transfer meeting ownership as 
>>>>>>>>>>> well, as
>>>>>>>>>>> new members needed admin approval and couldn't join automatically 
>>>>>>>>>>> this
>>>>>>>>>>> week. Also, I can understand it is summer holiday season for many.
>>>>>>>>>>>
>>>>>>>>>>> I've updated the function signature schema and other open
>>>>>>>>>>> points. I believe we're very close to the final version of the 
>>>>>>>>>>> spec. A
>>>>>>>>>>> meeting is indeed necessary to finalize this, but we don't have to 
>>>>>>>>>>> wait for
>>>>>>>>>>> it to finish the review process. We had many meetings on this in 
>>>>>>>>>>> the past
>>>>>>>>>>> already. So, please review the document at your earliest 
>>>>>>>>>>> convenience. If we
>>>>>>>>>>> agree on the spec by next week, I can raise a PR.
>>>>>>>>>>>
>>>>>>>>>>> - Ajantha
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I’d propose to move the field `properties` from a top level
>>>>>>>>>>>> field to a field inside “version” along with a representation, so 
>>>>>>>>>>>> that
>>>>>>>>>>>> properties are versioned. A property like “deterministic” could 
>>>>>>>>>>>> change
>>>>>>>>>>>> along with representation over time. For example, we need to change
>>>>>>>>>>>> “deterministic” from true to false in case of adding a 
>>>>>>>>>>>> non-deterministic
>>>>>>>>>>>> SQL expression/function(e.g., now()) inside an UDF. Otherwise, 
>>>>>>>>>>>> rollback
>>>>>>>>>>>> won't be safe.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, it's still an open question whether we need any
>>>>>>>>>>>> non-versioned properties. We can introduce them later if a use 
>>>>>>>>>>>> case arises.
>>>>>>>>>>>>
>>>>>>>>>>>> Yufei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I’d prefer to keep the signature list separate from the
>>>>>>>>>>>>> representation history. Here are reasons:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Each version still enforces a single signature.
>>>>>>>>>>>>>    Although the signatures array is global to the UDF, each 
>>>>>>>>>>>>> version references
>>>>>>>>>>>>>    just one signature ID. Rollbacks to historical versions remain 
>>>>>>>>>>>>> safe.
>>>>>>>>>>>>>    2. We’ve separated the less frequently changing component
>>>>>>>>>>>>>    (signatures) from the more dynamic one (representations) to 
>>>>>>>>>>>>> reduce metadata
>>>>>>>>>>>>>    file size.
>>>>>>>>>>>>>    3. Since signatures use Iceberg data types, they should
>>>>>>>>>>>>>    remain unaffected by multi-dialect representation differences.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>> We have discussed the action items from the last sync (*see
>>>>>>>>>>>>>> Appendix C* in the proposal doc)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Function overloading: Supported by few of the engines
>>>>>>>>>>>>>>    and in the roadmaps of many engines. Iceberg will support it. 
>>>>>>>>>>>>>> We will
>>>>>>>>>>>>>>    maintain the `FunctionIdentifier` (extends `TableIdentifer` 
>>>>>>>>>>>>>> but also have a
>>>>>>>>>>>>>>    member containing the function argument's type list). And all 
>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>    like load, rename, list, create and drop are based on 
>>>>>>>>>>>>>> `FunctionIdentifier`.
>>>>>>>>>>>>>>    - Secure UDF: If we store it as a property in a bag, we
>>>>>>>>>>>>>>    need to standardize the property name. Iceberg encryption may 
>>>>>>>>>>>>>> be orthogonal
>>>>>>>>>>>>>>    to this discussion.
>>>>>>>>>>>>>>    - UDF with multi statement and procedural bodies are
>>>>>>>>>>>>>>    supported by some engines. Iceberg will support it. Store the 
>>>>>>>>>>>>>> body as it is
>>>>>>>>>>>>>>    while creating function by the engine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> new discussions around
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Standardizing the property names (deterministic,
>>>>>>>>>>>>>>    secure).
>>>>>>>>>>>>>>    - About the rename function.
>>>>>>>>>>>>>>    - Replace function. To check upto what level replace is
>>>>>>>>>>>>>>    supported (considering function overloading) .
>>>>>>>>>>>>>>    - Signature should be associated with representation?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    I think we are close on the spec. Please review the
>>>>>>>>>>>>>>    proposal
>>>>>>>>>>>>>>    
>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>
>>>>>>>>>>>>>>    .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can it be handled by Iceberg encryption? If the whole
>>>>>>>>>>>>>>> metadata is encrypted, we don't have to worry about just hiding 
>>>>>>>>>>>>>>> the UDF
>>>>>>>>>>>>>>> body? Let us discuss more on the sync today.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, hiding the definition and disabling pushdown are
>>>>>>>>>>>>>>>> required.We will need a named key(e.g., secure) somewhere, no 
>>>>>>>>>>>>>>>> matter if it
>>>>>>>>>>>>>>>> is a top level property or a key as a part of the UDF 
>>>>>>>>>>>>>>>> properties. So that
>>>>>>>>>>>>>>>> both UDF creator and consumer can recognize it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <[email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the extra detail. What do you think the spec
>>>>>>>>>>>>>>>>> would require? Would it require hiding the UDF definition 
>>>>>>>>>>>>>>>>> from users and
>>>>>>>>>>>>>>>>> require specific pushdown cases be disabled? The use cases 
>>>>>>>>>>>>>>>>> seem valid, but
>>>>>>>>>>>>>>>>> I'm trying to understand the requirements this places on 
>>>>>>>>>>>>>>>>> engines and why it
>>>>>>>>>>>>>>>>> needs to be part of the spec, rather than part of the 
>>>>>>>>>>>>>>>>> properties of the UDF.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here are the main use cases for secure UDFs:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    1.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    Hiding UDF Definitions: This includes concealing the
>>>>>>>>>>>>>>>>>>    UDF body and details like the list of imports, some of 
>>>>>>>>>>>>>>>>>> them aren’t
>>>>>>>>>>>>>>>>>>    applicable to SQL UDFs.
>>>>>>>>>>>>>>>>>>    2.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    Sandboxed Execution: Ensuring the UDF runs in an
>>>>>>>>>>>>>>>>>>    isolated environment. Again, this typically doesn’t apply 
>>>>>>>>>>>>>>>>>> to SQL UDFs.
>>>>>>>>>>>>>>>>>>    3.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    Preventing Data Leakage at Execution Time: For
>>>>>>>>>>>>>>>>>>    example, secure UDFs may disable certain 
>>>>>>>>>>>>>>>>>> optimizations—such as predicate
>>>>>>>>>>>>>>>>>>    pushdown—to avoid exposing sensitive data indirectly. [1]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Given these scenarios, I agree with your point that the
>>>>>>>>>>>>>>>>>> secure flag is primarily an instruction to the engine to
>>>>>>>>>>>>>>>>>> behave differently. While it's largely an engine-side 
>>>>>>>>>>>>>>>>>> behavior, we still
>>>>>>>>>>>>>>>>>> need to include this flag in the UDF definition to indicate 
>>>>>>>>>>>>>>>>>> whether a UDF
>>>>>>>>>>>>>>>>>> is secure, especially considering the perf penalty 
>>>>>>>>>>>>>>>>>> introduced by scenario
>>>>>>>>>>>>>>>>>> #3. We should clearly recommend that users avoid marking 
>>>>>>>>>>>>>>>>>> UDFs as secure
>>>>>>>>>>>>>>>>>> unless it's truly necessary.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown
>>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yufei, could you make the argument for supporting a
>>>>>>>>>>>>>>>>>>> "secure" UDF? What use case are you addressing and what 
>>>>>>>>>>>>>>>>>>> specifically
>>>>>>>>>>>>>>>>>>> changes about how the UDF is handled? If the idea is to 
>>>>>>>>>>>>>>>>>>> hide the UDF
>>>>>>>>>>>>>>>>>>> definition, do we need to include it?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think this would be a signal to a "trusted engine".
>>>>>>>>>>>>>>>>>>> When the engine interacts with the catalog it sends 
>>>>>>>>>>>>>>>>>>> authorization
>>>>>>>>>>>>>>>>>>> information about itself in addition to the user that it is 
>>>>>>>>>>>>>>>>>>> acting on
>>>>>>>>>>>>>>>>>>> behalf of. That way the catalog knows that the secure UDF 
>>>>>>>>>>>>>>>>>>> can be sent to
>>>>>>>>>>>>>>>>>>> the engine and won't be shown to the user. The majority of 
>>>>>>>>>>>>>>>>>>> this logic is on
>>>>>>>>>>>>>>>>>>> the REST server side, and the only part that is 
>>>>>>>>>>>>>>>>>>> communicated to the client
>>>>>>>>>>>>>>>>>>> is the request not to show the UDF to the user, right? In 
>>>>>>>>>>>>>>>>>>> that case should
>>>>>>>>>>>>>>>>>>> this be a property rather than part of the definition? Even 
>>>>>>>>>>>>>>>>>>> if we state
>>>>>>>>>>>>>>>>>>> that the client "must" suppress the UDF definition, it's 
>>>>>>>>>>>>>>>>>>> really just a
>>>>>>>>>>>>>>>>>>> request. Only trusted engines can be passed the UDF 
>>>>>>>>>>>>>>>>>>> definition, so a spec
>>>>>>>>>>>>>>>>>>> requirement to suppress the definition isn't very 
>>>>>>>>>>>>>>>>>>> meaningful.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether
>>>>>>>>>>>>>>>>>>>> those statements run within a single transaction should be 
>>>>>>>>>>>>>>>>>>>> treated as an
>>>>>>>>>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out 
>>>>>>>>>>>>>>>>>>>> the expectation,
>>>>>>>>>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. 
>>>>>>>>>>>>>>>>>>>> Even if a UDF
>>>>>>>>>>>>>>>>>>>> declares itself transactional, the engine may or may not 
>>>>>>>>>>>>>>>>>>>> enforce it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF”
>>>>>>>>>>>>>>>>>>>> option supported by some engines[1], so the body and any 
>>>>>>>>>>>>>>>>>>>> sensitive details
>>>>>>>>>>>>>>>>>>>> stay hidden from callers?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing
>>>>>>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    - We have gone through the SQL UDF syntax
>>>>>>>>>>>>>>>>>>>>>    supported by different engines (Snowflake, databricks, 
>>>>>>>>>>>>>>>>>>>>> Dremio, Trino, OSS
>>>>>>>>>>>>>>>>>>>>>    spark 4.0).
>>>>>>>>>>>>>>>>>>>>>    - Each engine uses its own block separator, like
>>>>>>>>>>>>>>>>>>>>>    $$ or '' or none. Action item was to check whether 
>>>>>>>>>>>>>>>>>>>>> engines support
>>>>>>>>>>>>>>>>>>>>>    multi-statement (transactional) UDF bodies.
>>>>>>>>>>>>>>>>>>>>>    - Discussed about function overloading. Need to
>>>>>>>>>>>>>>>>>>>>>    check whether these engines support function 
>>>>>>>>>>>>>>>>>>>>> overloading for SQL UDFs.
>>>>>>>>>>>>>>>>>>>>>    Postgres supports it! If yes, need to adopt the spec 
>>>>>>>>>>>>>>>>>>>>> to handle it.
>>>>>>>>>>>>>>>>>>>>>    - Started online spec review and discussed the
>>>>>>>>>>>>>>>>>>>>>    deterministic flag and concluded that we keep the 
>>>>>>>>>>>>>>>>>>>>> independent fields (like
>>>>>>>>>>>>>>>>>>>>>    deterministic) in spec only if the majority of engines 
>>>>>>>>>>>>>>>>>>>>> supports it. Else it
>>>>>>>>>>>>>>>>>>>>>    will be passed in a property bag (engine specific). 
>>>>>>>>>>>>>>>>>>>>> And it is the engine's
>>>>>>>>>>>>>>>>>>>>>    responsibility to honor those optional properties.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Feel free to review the current proposal document here
>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is
>>>>>>>>>>>>>>>>>>>>> ready.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    We discussed including Python support; the
>>>>>>>>>>>>>>>>>>>>>>    majority agreed *not to* (see recording for
>>>>>>>>>>>>>>>>>>>>>>    details).
>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    No strong opposition to versioning — it will be
>>>>>>>>>>>>>>>>>>>>>>    included to support change tracking and similar use 
>>>>>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    Suggestions were made to document how each
>>>>>>>>>>>>>>>>>>>>>>    catalog resolves UDFs, similar to views and tables.
>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    We agreed not to deviate from the existing
>>>>>>>>>>>>>>>>>>>>>>    table/view spec — e.g., location will remain
>>>>>>>>>>>>>>>>>>>>>>    *required* for cross-catalog compatibility.
>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    We also discussed a bit about view
>>>>>>>>>>>>>>>>>>>>>>    interoperability as the same things are applicable 
>>>>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>    Feel free to review the proposal document
>>>>>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
>>>>>>>>>>>>>>>>>>>>>>  here.
>>>>>>>>>>>>>>>>>>>>>>    With the current scope, it is similar to the 
>>>>>>>>>>>>>>>>>>>>>> view/table spec now.
>>>>>>>>>>>>>>>>>>>>>>    Final spec will be put to review and vote once it
>>>>>>>>>>>>>>>>>>>>>>    is ready.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync
>>>>>>>>>>>>>>>>>>>>>>> for the UDF project. Everyone’s welcome to drop in and 
>>>>>>>>>>>>>>>>>>>>>>> share ideas! Here is
>>>>>>>>>>>>>>>>>>>>>>> the meeting link:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Iceberg UDF sync
>>>>>>>>>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am
>>>>>>>>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>>>>>>> Video call link:
>>>>>>>>>>>>>>>>>>>>>>> https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Update on the progress.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to
>>>>>>>>>>>>>>>>>>>>>>>> discuss the UDF proposal. We covered several key 
>>>>>>>>>>>>>>>>>>>>>>>> points, though some are
>>>>>>>>>>>>>>>>>>>>>>>> still open for further discussion:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning
>>>>>>>>>>>>>>>>>>>>>>>> for UDFs at this stage? We explored the possibility of 
>>>>>>>>>>>>>>>>>>>>>>>> simplifying the
>>>>>>>>>>>>>>>>>>>>>>>> specification by avoiding view replication, and 
>>>>>>>>>>>>>>>>>>>>>>>> potentially introducing
>>>>>>>>>>>>>>>>>>>>>>>> versioning support later. UDTFs, being a superset of 
>>>>>>>>>>>>>>>>>>>>>>>> views in some ways,
>>>>>>>>>>>>>>>>>>>>>>>> may not require versioning initially.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may
>>>>>>>>>>>>>>>>>>>>>>>> not support vararg syntax in CREATE FUNCTION,
>>>>>>>>>>>>>>>>>>>>>>>> Iceberg UDFs could represent such arguments as lists 
>>>>>>>>>>>>>>>>>>>>>>>> when supported by the
>>>>>>>>>>>>>>>>>>>>>>>> engine.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently
>>>>>>>>>>>>>>>>>>>>>>>> doesn’t support generic types (e.g., object), we
>>>>>>>>>>>>>>>>>>>>>>>> can only map engine-specific types to Iceberg types. 
>>>>>>>>>>>>>>>>>>>>>>>> As a result, generic
>>>>>>>>>>>>>>>>>>>>>>>> data types will not be supported in the initial 
>>>>>>>>>>>>>>>>>>>>>>>> version.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a
>>>>>>>>>>>>>>>>>>>>>>>> language for SQL UDFs seems promising, especially 
>>>>>>>>>>>>>>>>>>>>>>>> given its potential to
>>>>>>>>>>>>>>>>>>>>>>>> resolve interoperability challenges. Some engines, 
>>>>>>>>>>>>>>>>>>>>>>>> however, require
>>>>>>>>>>>>>>>>>>>>>>>> platform version and package dependency details to 
>>>>>>>>>>>>>>>>>>>>>>>> execute Python code—this
>>>>>>>>>>>>>>>>>>>>>>>> should be captured in the specification.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> *Next Steps*
>>>>>>>>>>>>>>>>>>>>>>>> I will update the proposal document with two
>>>>>>>>>>>>>>>>>>>>>>>> primary UDF use cases:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>    Policy exchange between engines
>>>>>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>    UDTF as a superset of view functionality
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The update will include corresponding syntax
>>>>>>>>>>>>>>>>>>>>>>>> examples in both SQL and Python, and detail how each 
>>>>>>>>>>>>>>>>>>>>>>>> use case is
>>>>>>>>>>>>>>>>>>>>>>>> represented in Iceberg metadata.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more
>>>>>>>>>>>>>>>>>>>>>>>> interested participants) to continue refining and 
>>>>>>>>>>>>>>>>>>>>>>>> finalizing the UDF
>>>>>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the
>>>>>>>>>>>>>>>>>>>>>>>>> previous comments. Additionally, I've included the 
>>>>>>>>>>>>>>>>>>>>>>>>> SQL UDF syntax supported
>>>>>>>>>>>>>>>>>>>>>>>>> by various vendors, including Dremio, Snowflake, 
>>>>>>>>>>>>>>>>>>>>>>>>> Databricks, and Trino.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper
>>>>>>>>>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, 
>>>>>>>>>>>>>>>>>>>>>>>>> especially with the
>>>>>>>>>>>>>>>>>>>>>>>>> renewed interest from the community.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was
>>>>>>>>>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and 
>>>>>>>>>>>>>>>>>>>>>>>>>> adding endpoints for
>>>>>>>>>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better
>>>>>>>>>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for 
>>>>>>>>>>>>>>>>>>>>>>>>>> fine-grained access
>>>>>>>>>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond 
>>>>>>>>>>>>>>>>>>>>>>>>>> just versioned and
>>>>>>>>>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many 
>>>>>>>>>>>>>>>>>>>>>>>>>> vendors are interested
>>>>>>>>>>>>>>>>>>>>>>>>>> in this feature.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Given the strong community interest and support,
>>>>>>>>>>>>>>>>>>>>>>>>>> I’d like to take ownership of this effort and revive 
>>>>>>>>>>>>>>>>>>>>>>>>>> the work. I'll be
>>>>>>>>>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and 
>>>>>>>>>>>>>>>>>>>>>>>>>> will share an updated
>>>>>>>>>>>>>>>>>>>>>>>>>> proposal by next week.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg!
>>>>>>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri
>>>>>>>>>>>>>>>>>>>>>>>>>> Bourlatchkov <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to
>>>>>>>>>>>>>>>>>>>>>>>>>>> be SQL. It merely does not specify (in this 
>>>>>>>>>>>>>>>>>>>>>>>>>>> revision) how other
>>>>>>>>>>>>>>>>>>>>>>>>>>> representations are to be written.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new
>>>>>>>>>>>>>>>>>>>>>>>>>>> type in the "Representations" section).
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the spec. It leaves a way for future versions to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>>>>>>> different representations later, but only SQL is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported. That was also
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the feedback to my initial skepticism about how it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> would work to add
>>>>>>>>>>>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bourlatchkov 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL representations, although it is certainly 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> faviouring SQL in examples...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be nice to add a non-SQL example, indeed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Driesprong <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this proposal focuses on SQL-based engines, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while Python-based systems
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> often work with data frames. Adding imperative 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> languages like Python would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make this proposal more inclusive.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Findeisen <[email protected]>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before  in this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread [1] i read
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard to share among different engines."
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addresses shareability between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines though.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-created functions shareable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in imperative code?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Findeisen
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread. The Iceberg UDFs are an interesting 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> statement look like in e..g Spark or Trino?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Best
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> added comments. I think it would be helpful to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also have a design doc that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> covers the choices from the draft spec. For 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, the choice to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enumerate all possible function input struts 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than allowing generics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and varargs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function signatures is limiting. It would be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nice to see a discussion of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the trade-offs and a rationale for the choice. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it would also be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> very helpful to have a few representative use 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases for this included in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the doc. That way the proposal can demonstrate 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that it solves those use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases with reasonable trade-offs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, using string IDs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> spec versioning than the Iceberg view and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table specs. It requires readers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to fail if there are any unknown fields, which 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the spec from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding things that are fully 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward-compatible. Other Iceberg specs only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> require a version change to introduce 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward-incompatible changes and I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think that this should do the same to avoid 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confusion.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple function signatures per verison, but 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it is unclear how to encode
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them because a version is associated with a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> single function signature.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating functions across engines, so this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn’t show that the metadata
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed is sufficient for cross-engine use 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shows a SELECT statement and it isn’t clear 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how this is distinct from a view
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> review comments, I will raise a PR for spec 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition next week.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have a look at the proposal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be interested. Resharing since I do not 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think it was directly linked
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> didn't get any review on the proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> far (from Benny).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the decisions and how we want to implement it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Walaa Eldin Moustafa <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here are some examples of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flow/data pipeline API instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible in the very long run :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jack Ye <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some long term opportunities in this case. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Consider you register a Spark
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> temp view as some sort of data frame read, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then it could still be resolved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to a Spark plan that is representable by an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I agree this gets very complicated very 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> soon, and just having the case
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) covered would already be a huge step 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Benny Chow <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tabular SQL UDF can be used to build a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parameterized view.  So, there's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> definitely a lot in common between UDFs and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Walaa Eldin Moustafa <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about what is perceived as a "UDF". There are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2 flavors:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by the user whose definition is a composition 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of other built-in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functions/SQL expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think those have more analogy to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not practical to maintain by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around (1), and may be worth
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> evaluating.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you post the proposal, but I think this would 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a very difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> models without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports SQL representations of UDFs (similar 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to views as shared by the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reference links above), the complexity 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> involved will be similar to managing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jack, for your input.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> draft spec (inspired by the view spec) this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> week to facilitate further
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 7:33 PM Jack Ye <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have a common set of functions across engines, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see how that is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> practical when those engines are implemented 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so differently. Plugging in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code -- and especially custom user-supplied 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code -- seems inherently
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specialized to me and should be part of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines' design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the views? I feel we can say exactly the same 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thing for Iceberg views, but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet we have Iceberg multi-dialect views 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implemented. Maybe it sounds like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we are trying to draw a line between SQL vs 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other programming language as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of code, and we are already
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code dialects to an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like Coral, Substrait), which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> representation of Iceberg view. I think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support is a good idea, even just a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multi-dialect one like view, and that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can allow engines to for example parse a view 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL, and when a function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> referenced cannot be resolved, try to seek for 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a multi-dialect UDF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> definition.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when we have the actual proposal published.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1:32 AM Robert Stupp <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and portable and "non-centralized" as views 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are. The same performance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns apply to views as well.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common base upon which engines can build, so 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the argument that UDFs aren't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> practical, because engines are different, is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably only a temporary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concern.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should also try to tackle the idea to make 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> views portable, which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> conceptually not that much different from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> portable UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a negative touch to the idea of having UDFs in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, especially not in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this early stage.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's a good idea to add UDFs tracked by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg catalogs. I think that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg primarily deals with things that are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> centralized, like tables of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data. While it would be great to have a common 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set of functions across
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines, I don't see how that is practical 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when those engines are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implemented so differently. Plugging in code 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- and especially custom
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied code -- seems inherently 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specialized to me and should be part
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the engines' design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you post the proposal, but I think this would 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a very difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> models without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8:10 AM Ajantha Bhat <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gauge the community interest in storing the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Versioned SQL UDFs in Iceberg.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg (inspired by view spec).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similarly to views in that they are associated 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with tables, but they can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accept arguments and produce return values, or 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> even function as inline
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dremio, Trino, Snowflake, Databricks Spark 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports SQL UDFs at catalog
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> level [1].
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can enable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the engines. Potentially engines can 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand the UDFs written by other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this feature into Iceberg would be a valuable 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition, and we're eager to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UDF specification.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> drafting a specification to propose to the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> community.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Databricks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

Reply via email to