I’d propose to move the field `properties` from a top level field to a
field inside “version” along with a representation, so that properties are
versioned. A property like “deterministic” could change along with
representation over time. For example, we need to change “deterministic”
from true to false in case of adding a non-deterministic SQL
expression/function(e.g., now()) inside an UDF. Otherwise, rollback won't
be safe.

That said, it's still an open question whether we need any non-versioned
properties. We can introduce them later if a use case arises.

Yufei


On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com> wrote:

> Thanks for the summary, Ajantha!
>
> I’d prefer to keep the signature list separate from the representation
> history. Here are reasons:
>
>    1. Each version still enforces a single signature. Although the
>    signatures array is global to the UDF, each version references just one
>    signature ID. Rollbacks to historical versions remain safe.
>    2. We’ve separated the less frequently changing component (signatures)
>    from the more dynamic one (representations) to reduce metadata file size.
>    3. Since signatures use Iceberg data types, they should remain
>    unaffected by multi-dialect representation differences.
>
> Yufei
>
>
> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
>> Thanks to everyone who joined the sync.
>> Here is the meeting recording:
>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing
>>
>> Summary:
>> We have discussed the action items from the last sync (*see Appendix C* in
>> the proposal doc)
>>
>>    - Function overloading: Supported by few of the engines and in the
>>    roadmaps of many engines. Iceberg will support it. We will maintain the
>>    `FunctionIdentifier` (extends `TableIdentifer` but also have a member
>>    containing the function argument's type list). And all operations like
>>    load, rename, list, create and drop are based on `FunctionIdentifier`.
>>    - Secure UDF: If we store it as a property in a bag, we need to
>>    standardize the property name. Iceberg encryption may be orthogonal to 
>> this
>>    discussion.
>>    - UDF with multi statement and procedural bodies are supported by
>>    some engines. Iceberg will support it. Store the body as it is while
>>    creating function by the engine.
>>
>> new discussions around
>>
>>    - Standardizing the property names (deterministic, secure).
>>    - About the rename function.
>>    - Replace function. To check upto what level replace is supported
>>    (considering function overloading) .
>>    - Signature should be associated with representation?
>>
>>    I think we are close on the spec. Please review the proposal
>>    
>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>
>>    .
>>
>> Details for next Iceberg UDF sync:
>>
>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: https://meet.google.com/aui-czix-nbh
>>
>> - Ajantha
>>
>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> Can it be handled by Iceberg encryption? If the whole metadata is
>>> encrypted, we don't have to worry about just hiding the UDF body? Let us
>>> discuss more on the sync today.
>>>
>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>>
>>>> Yes, hiding the definition and disabling pushdown are required.We will
>>>> need a named key(e.g., secure) somewhere, no matter if it is a top level
>>>> property or a key as a part of the UDF properties. So that both UDF creator
>>>> and consumer can recognize it.
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>
>>>>> Thanks for the extra detail. What do you think the spec would require?
>>>>> Would it require hiding the UDF definition from users and require specific
>>>>> pushdown cases be disabled? The use cases seem valid, but I'm trying to
>>>>> understand the requirements this places on engines and why it needs to be
>>>>> part of the spec, rather than part of the properties of the UDF.
>>>>>
>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>>>>
>>>>>> Hi Ryan,
>>>>>>
>>>>>> Here are the main use cases for secure UDFs:
>>>>>>
>>>>>>    1.
>>>>>>
>>>>>>    Hiding UDF Definitions: This includes concealing the UDF body and
>>>>>>    details like the list of imports, some of them aren’t applicable to 
>>>>>> SQL
>>>>>>    UDFs.
>>>>>>    2.
>>>>>>
>>>>>>    Sandboxed Execution: Ensuring the UDF runs in an isolated
>>>>>>    environment. Again, this typically doesn’t apply to SQL UDFs.
>>>>>>    3.
>>>>>>
>>>>>>    Preventing Data Leakage at Execution Time: For example, secure
>>>>>>    UDFs may disable certain optimizations—such as predicate pushdown—to 
>>>>>> avoid
>>>>>>    exposing sensitive data indirectly. [1]
>>>>>>
>>>>>> Given these scenarios, I agree with your point that the secure flag
>>>>>> is primarily an instruction to the engine to behave differently. While 
>>>>>> it's
>>>>>> largely an engine-side behavior, we still need to include this flag in 
>>>>>> the
>>>>>> UDF definition to indicate whether a UDF is secure, especially 
>>>>>> considering
>>>>>> the perf penalty introduced by scenario #3. We should clearly recommend
>>>>>> that users avoid marking UDFs as secure unless it's truly necessary.
>>>>>>
>>>>>> [1]
>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown
>>>>>> Yufei
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Yufei, could you make the argument for supporting a "secure" UDF?
>>>>>>> What use case are you addressing and what specifically changes about how
>>>>>>> the UDF is handled? If the idea is to hide the UDF definition, do we 
>>>>>>> need
>>>>>>> to include it?
>>>>>>>
>>>>>>> I think this would be a signal to a "trusted engine". When the
>>>>>>> engine interacts with the catalog it sends authorization information 
>>>>>>> about
>>>>>>> itself in addition to the user that it is acting on behalf of. That way 
>>>>>>> the
>>>>>>> catalog knows that the secure UDF can be sent to the engine and won't be
>>>>>>> shown to the user. The majority of this logic is on the REST server 
>>>>>>> side,
>>>>>>> and the only part that is communicated to the client is the request not 
>>>>>>> to
>>>>>>> show the UDF to the user, right? In that case should this be a property
>>>>>>> rather than part of the definition? Even if we state that the client 
>>>>>>> "must"
>>>>>>> suppress the UDF definition, it's really just a request. Only trusted
>>>>>>> engines can be passed the UDF definition, so a spec requirement to 
>>>>>>> suppress
>>>>>>> the definition isn't very meaningful.
>>>>>>>
>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>
>>>>>>>> Multi-statement UDFs are definitely useful, but whether those
>>>>>>>> statements run within a single transaction should be treated as an
>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the 
>>>>>>>> expectation,
>>>>>>>> yet the actual guarantee still depends on the runtime. Even if a UDF
>>>>>>>> declares itself transactional, the engine may or may not enforce it.
>>>>>>>>
>>>>>>>> One more thing: should we also introduce a “secure UDF” option
>>>>>>>> supported by some engines[1], so the body and any sensitive details 
>>>>>>>> stay
>>>>>>>> hidden from callers?
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure
>>>>>>>>
>>>>>>>> Yufei
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <
>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>> Here is the meeting recording:
>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing
>>>>>>>>> Summary:
>>>>>>>>>
>>>>>>>>>    - We have gone through the SQL UDF syntax supported by
>>>>>>>>>    different engines (Snowflake, databricks, Dremio, Trino, OSS spark 
>>>>>>>>> 4.0).
>>>>>>>>>    - Each engine uses its own block separator, like $$ or '' or
>>>>>>>>>    none. Action item was to check whether engines support 
>>>>>>>>> multi-statement
>>>>>>>>>    (transactional) UDF bodies.
>>>>>>>>>    - Discussed about function overloading. Need to check whether
>>>>>>>>>    these engines support function overloading for SQL UDFs. Postgres 
>>>>>>>>> supports
>>>>>>>>>    it! If yes, need to adopt the spec to handle it.
>>>>>>>>>    - Started online spec review and discussed the deterministic
>>>>>>>>>    flag and concluded that we keep the independent fields (like 
>>>>>>>>> deterministic)
>>>>>>>>>    in spec only if the majority of engines supports it. Else it will 
>>>>>>>>> be passed
>>>>>>>>>    in a property bag (engine specific). And it is the engine's
>>>>>>>>>    responsibility to honor those optional properties.
>>>>>>>>>
>>>>>>>>> Feel free to review the current proposal document here
>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>.
>>>>>>>>>
>>>>>>>>> Final spec will be put to review and vote once it is ready.
>>>>>>>>>
>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>
>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>>>>>>> Google Meet joining info
>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>
>>>>>>>>> - Ajantha
>>>>>>>>>
>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <ajanthab...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
>>>>>>>>>>
>>>>>>>>>> Summary:
>>>>>>>>>>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    We discussed including Python support; the majority agreed *not
>>>>>>>>>>    to* (see recording for details).
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    No strong opposition to versioning — it will be included to
>>>>>>>>>>    support change tracking and similar use cases.
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    Suggestions were made to document how each catalog resolves
>>>>>>>>>>    UDFs, similar to views and tables.
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    We agreed not to deviate from the existing table/view spec —
>>>>>>>>>>    e.g., location will remain *required* for cross-catalog
>>>>>>>>>>    compatibility.
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    We also discussed a bit about view interoperability as the
>>>>>>>>>>    same things are applicable here.
>>>>>>>>>>
>>>>>>>>>>    Feel free to review the proposal document
>>>>>>>>>>    
>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
>>>>>>>>>>  here.
>>>>>>>>>>    With the current scope, it is similar to the view/table spec now.
>>>>>>>>>>    Final spec will be put to review and vote once it is ready.
>>>>>>>>>>
>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>
>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>>>>>>>> Google Meet joining info
>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>
>>>>>>>>>> - Ajantha
>>>>>>>>>>
>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the UDF
>>>>>>>>>>> project. Everyone’s welcome to drop in and share ideas! Here is the 
>>>>>>>>>>> meeting
>>>>>>>>>>> link:
>>>>>>>>>>>
>>>>>>>>>>> Iceberg UDF sync
>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am
>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>
>>>>>>>>>>> Yufei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <
>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Update on the progress.
>>>>>>>>>>>>
>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss the UDF
>>>>>>>>>>>> proposal. We covered several key points, though some are still 
>>>>>>>>>>>> open for
>>>>>>>>>>>> further discussion:
>>>>>>>>>>>>
>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for UDFs at
>>>>>>>>>>>> this stage? We explored the possibility of simplifying the 
>>>>>>>>>>>> specification by
>>>>>>>>>>>> avoiding view replication, and potentially introducing versioning 
>>>>>>>>>>>> support
>>>>>>>>>>>> later. UDTFs, being a superset of views in some ways, may not 
>>>>>>>>>>>> require
>>>>>>>>>>>> versioning initially.
>>>>>>>>>>>>
>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not support
>>>>>>>>>>>> vararg syntax in CREATE FUNCTION, Iceberg UDFs could represent
>>>>>>>>>>>> such arguments as lists when supported by the engine.
>>>>>>>>>>>>
>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t support
>>>>>>>>>>>> generic types (e.g., object), we can only map engine-specific
>>>>>>>>>>>> types to Iceberg types. As a result, generic data types will not be
>>>>>>>>>>>> supported in the initial version.
>>>>>>>>>>>>
>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language for
>>>>>>>>>>>> SQL UDFs seems promising, especially given its potential to resolve
>>>>>>>>>>>> interoperability challenges. Some engines, however, require 
>>>>>>>>>>>> platform
>>>>>>>>>>>> version and package dependency details to execute Python code—this 
>>>>>>>>>>>> should
>>>>>>>>>>>> be captured in the specification.
>>>>>>>>>>>>
>>>>>>>>>>>> *Next Steps*
>>>>>>>>>>>> I will update the proposal document with two primary UDF use
>>>>>>>>>>>> cases:
>>>>>>>>>>>>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    Policy exchange between engines
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    UDTF as a superset of view functionality
>>>>>>>>>>>>
>>>>>>>>>>>> The update will include corresponding syntax examples in both
>>>>>>>>>>>> SQL and Python, and detail how each use case is represented in 
>>>>>>>>>>>> Iceberg
>>>>>>>>>>>> metadata.
>>>>>>>>>>>>
>>>>>>>>>>>> We also plan to set up regular syncs (open to more interested
>>>>>>>>>>>> participants) to continue refining and finalizing the UDF 
>>>>>>>>>>>> specification.
>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <
>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've updated the design document[1] based on the previous
>>>>>>>>>>>>> comments. Additionally, I've included the SQL UDF syntax 
>>>>>>>>>>>>> supported by
>>>>>>>>>>>>> various vendors, including Dremio, Snowflake, Databricks, and 
>>>>>>>>>>>>> Trino.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper discussion
>>>>>>>>>>>>> is needed. Let's keep moving forward, especially with the renewed 
>>>>>>>>>>>>> interest
>>>>>>>>>>>>> from the community.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <
>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> During the last catalog community sync, there was significant
>>>>>>>>>>>>>> interest in storing UDFs in Iceberg and adding endpoints for UDF 
>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>> in the REST catalog spec.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I recently discussed this with Yufei to better understand the
>>>>>>>>>>>>>> new requirement of using UDFs for fine-grained access control 
>>>>>>>>>>>>>> policies.
>>>>>>>>>>>>>> This expands the use cases beyond just versioned and 
>>>>>>>>>>>>>> interoperable UDFs.
>>>>>>>>>>>>>> Additionally, I learnt that many vendors are interested in this 
>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Given the strong community interest and support, I’d like to
>>>>>>>>>>>>>> take ownership of this effort and revive the work. I'll be 
>>>>>>>>>>>>>> revisiting the
>>>>>>>>>>>>>> document I proposed long back and will share an updated proposal 
>>>>>>>>>>>>>> by next
>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg!
>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The UDF spec does not require representations to be SQL. It
>>>>>>>>>>>>>>> merely does not specify (in this revision) how other 
>>>>>>>>>>>>>>> representations are to
>>>>>>>>>>>>>>> be written.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type in the
>>>>>>>>>>>>>>> "Representations" section).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue
>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the spec. It
>>>>>>>>>>>>>>>> leaves a way for future versions to add different 
>>>>>>>>>>>>>>>> representations later,
>>>>>>>>>>>>>>>> but only SQL is supported. That was also the feedback to my 
>>>>>>>>>>>>>>>> initial
>>>>>>>>>>>>>>>> skepticism about how it would work to add functions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL
>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring SQL in 
>>>>>>>>>>>>>>>>> examples... It
>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <
>>>>>>>>>>>>>>>>> fo...@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this proposal
>>>>>>>>>>>>>>>>>> focuses on SQL-based engines, while Python-based systems 
>>>>>>>>>>>>>>>>>> often work with
>>>>>>>>>>>>>>>>>> data frames. Adding imperative languages like Python would 
>>>>>>>>>>>>>>>>>> make this
>>>>>>>>>>>>>>>>>> proposal more inclusive.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen <
>>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>>>>>>>>>>>> In the design doc linked before  in this thread [1] i
>>>>>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to share
>>>>>>>>>>>>>>>>>>> among different engines."
>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully understand
>>>>>>>>>>>>>>>>>>> yet how the proposed design addresses shareability between 
>>>>>>>>>>>>>>>>>>> the engines
>>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>> I would use some help to understand this better.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>> Piotr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created
>>>>>>>>>>>>>>>>>>>> functions shareable
>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in imperative
>>>>>>>>>>>>>>>>>>>> code?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The
>>>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea!
>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created functions
>>>>>>>>>>>>>>>>>>>> sharable between the engines?
>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look
>>>>>>>>>>>>>>>>>>>> like in e..g Spark or Trino?
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Best
>>>>>>>>>>>>>>>>>>>> > Piotr
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added
>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a 
>>>>>>>>>>>>>>>>>>>> design doc that covers
>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the choice 
>>>>>>>>>>>>>>>>>>>> to enumerate all
>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing 
>>>>>>>>>>>>>>>>>>>> generics and varargs.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function
>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a 
>>>>>>>>>>>>>>>>>>>> discussion of the
>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think it 
>>>>>>>>>>>>>>>>>>>> would also be very
>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for this 
>>>>>>>>>>>>>>>>>>>> included in the
>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it solves 
>>>>>>>>>>>>>>>>>>>> those use cases
>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs.
>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is inconsistent
>>>>>>>>>>>>>>>>>>>> with conventions in other specs. For example, using string 
>>>>>>>>>>>>>>>>>>>> IDs rather than
>>>>>>>>>>>>>>>>>>>> an integer.
>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec versioning
>>>>>>>>>>>>>>>>>>>> than the Iceberg view and table specs. It requires readers 
>>>>>>>>>>>>>>>>>>>> to fail if there
>>>>>>>>>>>>>>>>>>>> are any unknown fields, which prevents the spec from 
>>>>>>>>>>>>>>>>>>>> adding things that are
>>>>>>>>>>>>>>>>>>>> fully backward-compatible. Other Iceberg specs only 
>>>>>>>>>>>>>>>>>>>> require a version
>>>>>>>>>>>>>>>>>>>> change to introduce forward-incompatible changes and I 
>>>>>>>>>>>>>>>>>>>> think that this
>>>>>>>>>>>>>>>>>>>> should do the same to avoid confusion.
>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple
>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear how to 
>>>>>>>>>>>>>>>>>>>> encode them
>>>>>>>>>>>>>>>>>>>> because a version is associated with a single function 
>>>>>>>>>>>>>>>>>>>> signature.
>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating
>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that the 
>>>>>>>>>>>>>>>>>>>> metadata proposed
>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases.
>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows a
>>>>>>>>>>>>>>>>>>>> SELECT statement and it isn’t clear how this is distinct 
>>>>>>>>>>>>>>>>>>>> from a view
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this.
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review
>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next week.
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a look at
>>>>>>>>>>>>>>>>>>>> the proposal
>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> - Ajantha
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin
>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an interesting
>>>>>>>>>>>>>>>>>>>> direction, but there might be some details that need to be 
>>>>>>>>>>>>>>>>>>>> fine tuned.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be
>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was directly 
>>>>>>>>>>>>>>>>>>>> linked in the
>>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> [1]
>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any
>>>>>>>>>>>>>>>>>>>> review on the proposal.
>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far (from
>>>>>>>>>>>>>>>>>>>> Benny).
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>>>>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal.
>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it.
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the
>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it.
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin
>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant
>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here are 
>>>>>>>>>>>>>>>>>>>> some examples of
>>>>>>>>>>>>>>>>>>>> what I meant in (2):
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation
>>>>>>>>>>>>>>>>>>>> of (1) where the API is data flow/data pipeline API 
>>>>>>>>>>>>>>>>>>>> instead of SQL (e.g.,
>>>>>>>>>>>>>>>>>>>> Spark Scala). Yes, that is also possible in the very long 
>>>>>>>>>>>>>>>>>>>> run :)
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye <
>>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative
>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some long term
>>>>>>>>>>>>>>>>>>>> opportunities in this case. Consider you register a Spark 
>>>>>>>>>>>>>>>>>>>> temp view as some
>>>>>>>>>>>>>>>>>>>> sort of data frame read, then it could still be resolved 
>>>>>>>>>>>>>>>>>>>> to a Spark plan
>>>>>>>>>>>>>>>>>>>> that is representable by an intermediate representation. 
>>>>>>>>>>>>>>>>>>>> But I agree this
>>>>>>>>>>>>>>>>>>>> gets very complicated very soon, and just having the case 
>>>>>>>>>>>>>>>>>>>> (1) covered would
>>>>>>>>>>>>>>>>>>>> already be a huge step forward.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow <
>>>>>>>>>>>>>>>>>>>> btc...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL
>>>>>>>>>>>>>>>>>>>> UDF can be used to build a parameterized view.  So, 
>>>>>>>>>>>>>>>>>>>> there's definitely a
>>>>>>>>>>>>>>>>>>>> lot in common between UDFs and views.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin
>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is
>>>>>>>>>>>>>>>>>>>> perceived as a "UDF". There are 2 flavors:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user
>>>>>>>>>>>>>>>>>>>> whose definition is a composition of other built-in 
>>>>>>>>>>>>>>>>>>>> functions/SQL
>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative
>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references
>>>>>>>>>>>>>>>>>>>> are pretty much from (1) and I think those have more 
>>>>>>>>>>>>>>>>>>>> analogy to views due
>>>>>>>>>>>>>>>>>>>> to their SQL nature. Agree (2) is not practical to 
>>>>>>>>>>>>>>>>>>>> maintain by Iceberg, but
>>>>>>>>>>>>>>>>>>>> I think Ajantha's use cases are around (1), and may be 
>>>>>>>>>>>>>>>>>>>> worth evaluating.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha
>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the
>>>>>>>>>>>>>>>>>>>> proposal, but I think this would be a very difficult area 
>>>>>>>>>>>>>>>>>>>> to tackle across
>>>>>>>>>>>>>>>>>>>> engines, languages, and memory models without having a 
>>>>>>>>>>>>>>>>>>>> huge performance
>>>>>>>>>>>>>>>>>>>> penalty.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL
>>>>>>>>>>>>>>>>>>>> representations of UDFs (similar to views as shared by the 
>>>>>>>>>>>>>>>>>>>> reference links
>>>>>>>>>>>>>>>>>>>> above), the complexity involved will be similar to 
>>>>>>>>>>>>>>>>>>>> managing views.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your
>>>>>>>>>>>>>>>>>>>> input.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec
>>>>>>>>>>>>>>>>>>>> (inspired by the view spec) this week to facilitate 
>>>>>>>>>>>>>>>>>>>> further discussions.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <
>>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a
>>>>>>>>>>>>>>>>>>>> common set of functions across engines, I don't see how 
>>>>>>>>>>>>>>>>>>>> that is practical
>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. 
>>>>>>>>>>>>>>>>>>>> Plugging in code -- and
>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems inherently 
>>>>>>>>>>>>>>>>>>>> specialized to me
>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I
>>>>>>>>>>>>>>>>>>>> feel we can say exactly the same thing for Iceberg views, 
>>>>>>>>>>>>>>>>>>>> but yet we have
>>>>>>>>>>>>>>>>>>>> Iceberg multi-dialect views implemented. Maybe it sounds 
>>>>>>>>>>>>>>>>>>>> like we are trying
>>>>>>>>>>>>>>>>>>>> to draw a line between SQL vs other programming language 
>>>>>>>>>>>>>>>>>>>> as "code"? but I
>>>>>>>>>>>>>>>>>>>> think SQL is just another type of code, and we are already 
>>>>>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>>>>>> compiling all these different code dialects to an 
>>>>>>>>>>>>>>>>>>>> intermediate
>>>>>>>>>>>>>>>>>>>> representation (using projects like Coral, Substrait), 
>>>>>>>>>>>>>>>>>>>> which will be stored
>>>>>>>>>>>>>>>>>>>> as another type of representation of Iceberg view. I think 
>>>>>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>>>>>> functionality can be used for UDFs if developed.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a
>>>>>>>>>>>>>>>>>>>> good idea, even just a multi-dialect one like view, and 
>>>>>>>>>>>>>>>>>>>> that can allow
>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a 
>>>>>>>>>>>>>>>>>>>> function referenced
>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect UDF 
>>>>>>>>>>>>>>>>>>>> definition.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have
>>>>>>>>>>>>>>>>>>>> the actual proposal published.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert
>>>>>>>>>>>>>>>>>>>> Stupp <sn...@snazy.de> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable
>>>>>>>>>>>>>>>>>>>> and "non-centralized" as views are. The same performance 
>>>>>>>>>>>>>>>>>>>> concerns apply to
>>>>>>>>>>>>>>>>>>>> views as well.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon
>>>>>>>>>>>>>>>>>>>> which engines can build, so the argument that UDFs aren't 
>>>>>>>>>>>>>>>>>>>> practical,
>>>>>>>>>>>>>>>>>>>> because engines are different, is probably only a 
>>>>>>>>>>>>>>>>>>>> temporary concern.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also
>>>>>>>>>>>>>>>>>>>> try to tackle the idea to make views portable, which is 
>>>>>>>>>>>>>>>>>>>> conceptually not
>>>>>>>>>>>>>>>>>>>> that much different from portable UDFs.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative
>>>>>>>>>>>>>>>>>>>> touch to the idea of having UDFs in Iceberg, especially 
>>>>>>>>>>>>>>>>>>>> not in this early
>>>>>>>>>>>>>>>>>>>> stage.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good
>>>>>>>>>>>>>>>>>>>> idea to add UDFs tracked by Iceberg catalogs. I think that 
>>>>>>>>>>>>>>>>>>>> Iceberg
>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, like 
>>>>>>>>>>>>>>>>>>>> tables of data.
>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of functions 
>>>>>>>>>>>>>>>>>>>> across engines, I
>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines are 
>>>>>>>>>>>>>>>>>>>> implemented so
>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially custom 
>>>>>>>>>>>>>>>>>>>> user-supplied code
>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be part 
>>>>>>>>>>>>>>>>>>>> of the engines'
>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post
>>>>>>>>>>>>>>>>>>>> the proposal, but I think this would be a very difficult 
>>>>>>>>>>>>>>>>>>>> area to tackle
>>>>>>>>>>>>>>>>>>>> across engines, languages, and memory models without 
>>>>>>>>>>>>>>>>>>>> having a huge
>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha
>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the
>>>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs in 
>>>>>>>>>>>>>>>>>>>> Iceberg.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition
>>>>>>>>>>>>>>>>>>>> for storing the versioned UDFs in Iceberg (inspired by 
>>>>>>>>>>>>>>>>>>>> view spec).
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to
>>>>>>>>>>>>>>>>>>>> views in that they are associated with tables, but they 
>>>>>>>>>>>>>>>>>>>> can accept
>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even function as 
>>>>>>>>>>>>>>>>>>>> inline expressions.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino,
>>>>>>>>>>>>>>>>>>>> Snowflake, Databricks Spark supports SQL UDFs at catalog 
>>>>>>>>>>>>>>>>>>>> level [1].
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines.
>>>>>>>>>>>>>>>>>>>> Potentially engines can understand the UDFs written by 
>>>>>>>>>>>>>>>>>>>> other engines (with
>>>>>>>>>>>>>>>>>>>> the translate layer).
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this
>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, and 
>>>>>>>>>>>>>>>>>>>> we're eager to
>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF 
>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a
>>>>>>>>>>>>>>>>>>>> specification to propose to the community.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>>> >> Ryan Blue
>>>>>>>>>>>>>>>>>>>> >> Databricks
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>> Databricks
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Reply via email to