Thanks for the summary, Ajantha! I’d prefer to keep the signature list separate from the representation history. Here are reasons:
1. Each version still enforces a single signature. Although the signatures array is global to the UDF, each version references just one signature ID. Rollbacks to historical versions remain safe. 2. We’ve separated the less frequently changing component (signatures) from the more dynamic one (representations) to reduce metadata file size. 3. Since signatures use Iceberg data types, they should remain unaffected by multi-dialect representation differences. Yufei On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Thanks to everyone who joined the sync. > Here is the meeting recording: > https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing > > Summary: > We have discussed the action items from the last sync (*see Appendix C* in > the proposal doc) > > - Function overloading: Supported by few of the engines and in the > roadmaps of many engines. Iceberg will support it. We will maintain the > `FunctionIdentifier` (extends `TableIdentifer` but also have a member > containing the function argument's type list). And all operations like > load, rename, list, create and drop are based on `FunctionIdentifier`. > - Secure UDF: If we store it as a property in a bag, we need to > standardize the property name. Iceberg encryption may be orthogonal to this > discussion. > - UDF with multi statement and procedural bodies are supported by some > engines. Iceberg will support it. Store the body as it is while creating > function by the engine. > > new discussions around > > - Standardizing the property names (deterministic, secure). > - About the rename function. > - Replace function. To check upto what level replace is supported > (considering function overloading) . > - Signature should be associated with representation? > > I think we are close on the spec. Please review the proposal > > <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing> > . > > Details for next Iceberg UDF sync: > > *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles > Google Meet joining info > Video call link: https://meet.google.com/aui-czix-nbh > > - Ajantha > > On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> Can it be handled by Iceberg encryption? If the whole metadata is >> encrypted, we don't have to worry about just hiding the UDF body? Let us >> discuss more on the sync today. >> >> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> Yes, hiding the definition and disabling pushdown are required.We will >>> need a named key(e.g., secure) somewhere, no matter if it is a top level >>> property or a key as a part of the UDF properties. So that both UDF creator >>> and consumer can recognize it. >>> >>> Yufei >>> >>> >>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> wrote: >>> >>>> Thanks for the extra detail. What do you think the spec would require? >>>> Would it require hiding the UDF definition from users and require specific >>>> pushdown cases be disabled? The use cases seem valid, but I'm trying to >>>> understand the requirements this places on engines and why it needs to be >>>> part of the spec, rather than part of the properties of the UDF. >>>> >>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com> wrote: >>>> >>>>> Hi Ryan, >>>>> >>>>> Here are the main use cases for secure UDFs: >>>>> >>>>> 1. >>>>> >>>>> Hiding UDF Definitions: This includes concealing the UDF body and >>>>> details like the list of imports, some of them aren’t applicable to SQL >>>>> UDFs. >>>>> 2. >>>>> >>>>> Sandboxed Execution: Ensuring the UDF runs in an isolated >>>>> environment. Again, this typically doesn’t apply to SQL UDFs. >>>>> 3. >>>>> >>>>> Preventing Data Leakage at Execution Time: For example, secure >>>>> UDFs may disable certain optimizations—such as predicate pushdown—to >>>>> avoid >>>>> exposing sensitive data indirectly. [1] >>>>> >>>>> Given these scenarios, I agree with your point that the secure flag >>>>> is primarily an instruction to the engine to behave differently. While >>>>> it's >>>>> largely an engine-side behavior, we still need to include this flag in the >>>>> UDF definition to indicate whether a UDF is secure, especially considering >>>>> the perf penalty introduced by scenario #3. We should clearly recommend >>>>> that users avoid marking UDFs as secure unless it's truly necessary. >>>>> >>>>> [1] >>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown >>>>> Yufei >>>>> >>>>> >>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com> wrote: >>>>> >>>>>> Yufei, could you make the argument for supporting a "secure" UDF? >>>>>> What use case are you addressing and what specifically changes about how >>>>>> the UDF is handled? If the idea is to hide the UDF definition, do we need >>>>>> to include it? >>>>>> >>>>>> I think this would be a signal to a "trusted engine". When the engine >>>>>> interacts with the catalog it sends authorization information about >>>>>> itself >>>>>> in addition to the user that it is acting on behalf of. That way the >>>>>> catalog knows that the secure UDF can be sent to the engine and won't be >>>>>> shown to the user. The majority of this logic is on the REST server side, >>>>>> and the only part that is communicated to the client is the request not >>>>>> to >>>>>> show the UDF to the user, right? In that case should this be a property >>>>>> rather than part of the definition? Even if we state that the client >>>>>> "must" >>>>>> suppress the UDF definition, it's really just a request. Only trusted >>>>>> engines can be passed the UDF definition, so a spec requirement to >>>>>> suppress >>>>>> the definition isn't very meaningful. >>>>>> >>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <flyrain...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the summary, Ajantha! >>>>>>> >>>>>>> Multi-statement UDFs are definitely useful, but whether those >>>>>>> statements run within a single transaction should be treated as an >>>>>>> engine-level concern. The Iceberg UDF spec can spell out the >>>>>>> expectation, >>>>>>> yet the actual guarantee still depends on the runtime. Even if a UDF >>>>>>> declares itself transactional, the engine may or may not enforce it. >>>>>>> >>>>>>> One more thing: should we also introduce a “secure UDF” option >>>>>>> supported by some engines[1], so the body and any sensitive details stay >>>>>>> hidden from callers? >>>>>>> >>>>>>> [1] >>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure >>>>>>> >>>>>>> Yufei >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks to everyone who joined the sync. >>>>>>>> Here is the meeting recording: >>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing >>>>>>>> Summary: >>>>>>>> >>>>>>>> - We have gone through the SQL UDF syntax supported by >>>>>>>> different engines (Snowflake, databricks, Dremio, Trino, OSS spark >>>>>>>> 4.0). >>>>>>>> - Each engine uses its own block separator, like $$ or '' or >>>>>>>> none. Action item was to check whether engines support >>>>>>>> multi-statement >>>>>>>> (transactional) UDF bodies. >>>>>>>> - Discussed about function overloading. Need to check whether >>>>>>>> these engines support function overloading for SQL UDFs. Postgres >>>>>>>> supports >>>>>>>> it! If yes, need to adopt the spec to handle it. >>>>>>>> - Started online spec review and discussed the deterministic >>>>>>>> flag and concluded that we keep the independent fields (like >>>>>>>> deterministic) >>>>>>>> in spec only if the majority of engines supports it. Else it will >>>>>>>> be passed >>>>>>>> in a property bag (engine specific). And it is the engine's >>>>>>>> responsibility to honor those optional properties. >>>>>>>> >>>>>>>> Feel free to review the current proposal document here >>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. >>>>>>>> >>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>> >>>>>>>> Details for next Iceberg UDF sync: >>>>>>>> >>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>>> Google Meet joining info >>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>> >>>>>>>> - Ajantha >>>>>>>> >>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>> Here is the meeting recording: >>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing >>>>>>>>> >>>>>>>>> Summary: >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> We discussed including Python support; the majority agreed *not >>>>>>>>> to* (see recording for details). >>>>>>>>> - >>>>>>>>> >>>>>>>>> No strong opposition to versioning — it will be included to >>>>>>>>> support change tracking and similar use cases. >>>>>>>>> - >>>>>>>>> >>>>>>>>> Suggestions were made to document how each catalog resolves >>>>>>>>> UDFs, similar to views and tables. >>>>>>>>> - >>>>>>>>> >>>>>>>>> We agreed not to deviate from the existing table/view spec — >>>>>>>>> e.g., location will remain *required* for cross-catalog >>>>>>>>> compatibility. >>>>>>>>> - >>>>>>>>> >>>>>>>>> We also discussed a bit about view interoperability as the >>>>>>>>> same things are applicable here. >>>>>>>>> >>>>>>>>> Feel free to review the proposal document >>>>>>>>> >>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> >>>>>>>>> here. >>>>>>>>> With the current scope, it is similar to the view/table spec now. >>>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>>> >>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>> >>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>>>> Google Meet joining info >>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>> >>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi folks, >>>>>>>>>> >>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the UDF >>>>>>>>>> project. Everyone’s welcome to drop in and share ideas! Here is the >>>>>>>>>> meeting >>>>>>>>>> link: >>>>>>>>>> >>>>>>>>>> Iceberg UDF sync >>>>>>>>>> Monday, June 2 · 9:00 – 10:00am >>>>>>>>>> Time zone: America/Los_Angeles >>>>>>>>>> Google Meet joining info >>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>> >>>>>>>>>> Yufei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Update on the progress. >>>>>>>>>>> >>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss the UDF >>>>>>>>>>> proposal. We covered several key points, though some are still open >>>>>>>>>>> for >>>>>>>>>>> further discussion: >>>>>>>>>>> >>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for UDFs at >>>>>>>>>>> this stage? We explored the possibility of simplifying the >>>>>>>>>>> specification by >>>>>>>>>>> avoiding view replication, and potentially introducing versioning >>>>>>>>>>> support >>>>>>>>>>> later. UDTFs, being a superset of views in some ways, may not >>>>>>>>>>> require >>>>>>>>>>> versioning initially. >>>>>>>>>>> >>>>>>>>>>> b) *VarArgs Support*: While some query engines may not support >>>>>>>>>>> vararg syntax in CREATE FUNCTION, Iceberg UDFs could represent >>>>>>>>>>> such arguments as lists when supported by the engine. >>>>>>>>>>> >>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t support >>>>>>>>>>> generic types (e.g., object), we can only map engine-specific >>>>>>>>>>> types to Iceberg types. As a result, generic data types will not be >>>>>>>>>>> supported in the initial version. >>>>>>>>>>> >>>>>>>>>>> d) *Python Support*: Incorporating Python as a language for SQL >>>>>>>>>>> UDFs seems promising, especially given its potential to resolve >>>>>>>>>>> interoperability challenges. Some engines, however, require platform >>>>>>>>>>> version and package dependency details to execute Python code—this >>>>>>>>>>> should >>>>>>>>>>> be captured in the specification. >>>>>>>>>>> >>>>>>>>>>> *Next Steps* >>>>>>>>>>> I will update the proposal document with two primary UDF use >>>>>>>>>>> cases: >>>>>>>>>>> >>>>>>>>>>> - >>>>>>>>>>> >>>>>>>>>>> Policy exchange between engines >>>>>>>>>>> - >>>>>>>>>>> >>>>>>>>>>> UDTF as a superset of view functionality >>>>>>>>>>> >>>>>>>>>>> The update will include corresponding syntax examples in both >>>>>>>>>>> SQL and Python, and detail how each use case is represented in >>>>>>>>>>> Iceberg >>>>>>>>>>> metadata. >>>>>>>>>>> >>>>>>>>>>> We also plan to set up regular syncs (open to more interested >>>>>>>>>>> participants) to continue refining and finalizing the UDF >>>>>>>>>>> specification. >>>>>>>>>>> - Ajantha >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi everyone, >>>>>>>>>>>> >>>>>>>>>>>> I've updated the design document[1] based on the previous >>>>>>>>>>>> comments. Additionally, I've included the SQL UDF syntax supported >>>>>>>>>>>> by >>>>>>>>>>>> various vendors, including Dremio, Snowflake, Databricks, and >>>>>>>>>>>> Trino. >>>>>>>>>>>> >>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper discussion is >>>>>>>>>>>> needed. Let's keep moving forward, especially with the renewed >>>>>>>>>>>> interest >>>>>>>>>>>> from the community. >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> During the last catalog community sync, there was significant >>>>>>>>>>>>> interest in storing UDFs in Iceberg and adding endpoints for UDF >>>>>>>>>>>>> handling >>>>>>>>>>>>> in the REST catalog spec. >>>>>>>>>>>>> >>>>>>>>>>>>> I recently discussed this with Yufei to better understand the >>>>>>>>>>>>> new requirement of using UDFs for fine-grained access control >>>>>>>>>>>>> policies. >>>>>>>>>>>>> This expands the use cases beyond just versioned and >>>>>>>>>>>>> interoperable UDFs. >>>>>>>>>>>>> Additionally, I learnt that many vendors are interested in this >>>>>>>>>>>>> feature. >>>>>>>>>>>>> >>>>>>>>>>>>> Given the strong community interest and support, I’d like to >>>>>>>>>>>>> take ownership of this effort and revive the work. I'll be >>>>>>>>>>>>> revisiting the >>>>>>>>>>>>> document I proposed long back and will share an updated proposal >>>>>>>>>>>>> by next >>>>>>>>>>>>> week. >>>>>>>>>>>>> >>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg! >>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The UDF spec does not require representations to be SQL. It >>>>>>>>>>>>>> merely does not specify (in this revision) how other >>>>>>>>>>>>>> representations are to >>>>>>>>>>>>>> be written. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This seems like an easy extension (adding a new type in the >>>>>>>>>>>>>> "Representations" section). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue >>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the spec. It >>>>>>>>>>>>>>> leaves a way for future versions to add different >>>>>>>>>>>>>>> representations later, >>>>>>>>>>>>>>> but only SQL is supported. That was also the feedback to my >>>>>>>>>>>>>>> initial >>>>>>>>>>>>>>> skepticism about how it would work to add functions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL >>>>>>>>>>>>>>>> representations, although it is certainly faviouring SQL in >>>>>>>>>>>>>>>> examples... It >>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong < >>>>>>>>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this proposal >>>>>>>>>>>>>>>>> focuses on SQL-based engines, while Python-based systems >>>>>>>>>>>>>>>>> often work with >>>>>>>>>>>>>>>>> data frames. Adding imperative languages like Python would >>>>>>>>>>>>>>>>> make this >>>>>>>>>>>>>>>>> proposal more inclusive. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>>>>>>>>>> In the design doc linked before in this thread [1] i >>>>>>>>>>>>>>>>>> read >>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to share >>>>>>>>>>>>>>>>>> among different engines." >>>>>>>>>>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully understand yet >>>>>>>>>>>>>>>>>> how the proposed design addresses shareability between the >>>>>>>>>>>>>>>>>> engines though. >>>>>>>>>>>>>>>>>> I would use some help to understand this better. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>>>>> Piotr >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created functions >>>>>>>>>>>>>>>>>>> shareable >>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in imperative >>>>>>>>>>>>>>>>>>> code? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The >>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea! >>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created functions >>>>>>>>>>>>>>>>>>> sharable between the engines? >>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look like >>>>>>>>>>>>>>>>>>> in e..g Spark or Trino? >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Best >>>>>>>>>>>>>>>>>>> > Piotr >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added >>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a design >>>>>>>>>>>>>>>>>>> doc that covers >>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the choice >>>>>>>>>>>>>>>>>>> to enumerate all >>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing >>>>>>>>>>>>>>>>>>> generics and varargs. >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function >>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a >>>>>>>>>>>>>>>>>>> discussion of the >>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think it would >>>>>>>>>>>>>>>>>>> also be very >>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for this >>>>>>>>>>>>>>>>>>> included in the >>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it solves >>>>>>>>>>>>>>>>>>> those use cases >>>>>>>>>>>>>>>>>>> with reasonable trade-offs. >>>>>>>>>>>>>>>>>>> >> There are a few instances where this is inconsistent >>>>>>>>>>>>>>>>>>> with conventions in other specs. For example, using string >>>>>>>>>>>>>>>>>>> IDs rather than >>>>>>>>>>>>>>>>>>> an integer. >>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec versioning >>>>>>>>>>>>>>>>>>> than the Iceberg view and table specs. It requires readers >>>>>>>>>>>>>>>>>>> to fail if there >>>>>>>>>>>>>>>>>>> are any unknown fields, which prevents the spec from adding >>>>>>>>>>>>>>>>>>> things that are >>>>>>>>>>>>>>>>>>> fully backward-compatible. Other Iceberg specs only require >>>>>>>>>>>>>>>>>>> a version >>>>>>>>>>>>>>>>>>> change to introduce forward-incompatible changes and I >>>>>>>>>>>>>>>>>>> think that this >>>>>>>>>>>>>>>>>>> should do the same to avoid confusion. >>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple >>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear how to >>>>>>>>>>>>>>>>>>> encode them >>>>>>>>>>>>>>>>>>> because a version is associated with a single function >>>>>>>>>>>>>>>>>>> signature. >>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating >>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that the >>>>>>>>>>>>>>>>>>> metadata proposed >>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases. >>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows a >>>>>>>>>>>>>>>>>>> SELECT statement and it isn’t clear how this is distinct >>>>>>>>>>>>>>>>>>> from a view >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review >>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next week. >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a look at >>>>>>>>>>>>>>>>>>> the proposal >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> - Ajantha >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa >>>>>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an interesting >>>>>>>>>>>>>>>>>>> direction, but there might be some details that need to be >>>>>>>>>>>>>>>>>>> fine tuned. >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be >>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was directly >>>>>>>>>>>>>>>>>>> linked in the >>>>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> [1] >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>> Walaa. >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any >>>>>>>>>>>>>>>>>>> review on the proposal. >>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far (from >>>>>>>>>>>>>>>>>>> Benny). >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal. >>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions >>>>>>>>>>>>>>>>>>> and how we want to implement it. >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin >>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant >>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here are >>>>>>>>>>>>>>>>>>> some examples of >>>>>>>>>>>>>>>>>>> what I meant in (2): >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation of >>>>>>>>>>>>>>>>>>> (1) where the API is data flow/data pipeline API instead of >>>>>>>>>>>>>>>>>>> SQL (e.g., >>>>>>>>>>>>>>>>>>> Spark Scala). Yes, that is also possible in the very long >>>>>>>>>>>>>>>>>>> run :) >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some long term >>>>>>>>>>>>>>>>>>> opportunities in this case. Consider you register a Spark >>>>>>>>>>>>>>>>>>> temp view as some >>>>>>>>>>>>>>>>>>> sort of data frame read, then it could still be resolved to >>>>>>>>>>>>>>>>>>> a Spark plan >>>>>>>>>>>>>>>>>>> that is representable by an intermediate representation. >>>>>>>>>>>>>>>>>>> But I agree this >>>>>>>>>>>>>>>>>>> gets very complicated very soon, and just having the case >>>>>>>>>>>>>>>>>>> (1) covered would >>>>>>>>>>>>>>>>>>> already be a huge step forward. >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow < >>>>>>>>>>>>>>>>>>> btc...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL >>>>>>>>>>>>>>>>>>> UDF can be used to build a parameterized view. So, there's >>>>>>>>>>>>>>>>>>> definitely a >>>>>>>>>>>>>>>>>>> lot in common between UDFs and views. >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin >>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is >>>>>>>>>>>>>>>>>>> perceived as a "UDF". There are 2 flavors: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user >>>>>>>>>>>>>>>>>>> whose definition is a composition of other built-in >>>>>>>>>>>>>>>>>>> functions/SQL >>>>>>>>>>>>>>>>>>> expressions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are >>>>>>>>>>>>>>>>>>> pretty much from (1) and I think those have more analogy to >>>>>>>>>>>>>>>>>>> views due to >>>>>>>>>>>>>>>>>>> their SQL nature. Agree (2) is not practical to maintain by >>>>>>>>>>>>>>>>>>> Iceberg, but I >>>>>>>>>>>>>>>>>>> think Ajantha's use cases are around (1), and may be worth >>>>>>>>>>>>>>>>>>> evaluating. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat >>>>>>>>>>>>>>>>>>> <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>>>>>>>>>> proposal, but I think this would be a very difficult area >>>>>>>>>>>>>>>>>>> to tackle across >>>>>>>>>>>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>>> penalty. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL >>>>>>>>>>>>>>>>>>> representations of UDFs (similar to views as shared by the >>>>>>>>>>>>>>>>>>> reference links >>>>>>>>>>>>>>>>>>> above), the complexity involved will be similar to managing >>>>>>>>>>>>>>>>>>> views. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your >>>>>>>>>>>>>>>>>>> input. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec >>>>>>>>>>>>>>>>>>> (inspired by the view spec) this week to facilitate further >>>>>>>>>>>>>>>>>>> discussions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye < >>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common >>>>>>>>>>>>>>>>>>> set of functions across engines, I don't see how that is >>>>>>>>>>>>>>>>>>> practical when >>>>>>>>>>>>>>>>>>> those engines are implemented so differently. Plugging in >>>>>>>>>>>>>>>>>>> code -- and >>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems inherently >>>>>>>>>>>>>>>>>>> specialized to me >>>>>>>>>>>>>>>>>>> and should be part of the engines' design. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I >>>>>>>>>>>>>>>>>>> feel we can say exactly the same thing for Iceberg views, >>>>>>>>>>>>>>>>>>> but yet we have >>>>>>>>>>>>>>>>>>> Iceberg multi-dialect views implemented. Maybe it sounds >>>>>>>>>>>>>>>>>>> like we are trying >>>>>>>>>>>>>>>>>>> to draw a line between SQL vs other programming language as >>>>>>>>>>>>>>>>>>> "code"? but I >>>>>>>>>>>>>>>>>>> think SQL is just another type of code, and we are already >>>>>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>>>>> compiling all these different code dialects to an >>>>>>>>>>>>>>>>>>> intermediate >>>>>>>>>>>>>>>>>>> representation (using projects like Coral, Substrait), >>>>>>>>>>>>>>>>>>> which will be stored >>>>>>>>>>>>>>>>>>> as another type of representation of Iceberg view. I think >>>>>>>>>>>>>>>>>>> the same >>>>>>>>>>>>>>>>>>> functionality can be used for UDFs if developed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a >>>>>>>>>>>>>>>>>>> good idea, even just a multi-dialect one like view, and >>>>>>>>>>>>>>>>>>> that can allow >>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a >>>>>>>>>>>>>>>>>>> function referenced >>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect UDF >>>>>>>>>>>>>>>>>>> definition. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have >>>>>>>>>>>>>>>>>>> the actual proposal published. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert >>>>>>>>>>>>>>>>>>> Stupp <sn...@snazy.de> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable >>>>>>>>>>>>>>>>>>> and "non-centralized" as views are. The same performance >>>>>>>>>>>>>>>>>>> concerns apply to >>>>>>>>>>>>>>>>>>> views as well. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon >>>>>>>>>>>>>>>>>>> which engines can build, so the argument that UDFs aren't >>>>>>>>>>>>>>>>>>> practical, >>>>>>>>>>>>>>>>>>> because engines are different, is probably only a temporary >>>>>>>>>>>>>>>>>>> concern. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try >>>>>>>>>>>>>>>>>>> to tackle the idea to make views portable, which is >>>>>>>>>>>>>>>>>>> conceptually not that >>>>>>>>>>>>>>>>>>> much different from portable UDFs. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative >>>>>>>>>>>>>>>>>>> touch to the idea of having UDFs in Iceberg, especially not >>>>>>>>>>>>>>>>>>> in this early >>>>>>>>>>>>>>>>>>> stage. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good >>>>>>>>>>>>>>>>>>> idea to add UDFs tracked by Iceberg catalogs. I think that >>>>>>>>>>>>>>>>>>> Iceberg >>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, like >>>>>>>>>>>>>>>>>>> tables of data. >>>>>>>>>>>>>>>>>>> While it would be great to have a common set of functions >>>>>>>>>>>>>>>>>>> across engines, I >>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines are >>>>>>>>>>>>>>>>>>> implemented so >>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially custom >>>>>>>>>>>>>>>>>>> user-supplied code >>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be part of >>>>>>>>>>>>>>>>>>> the engines' >>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>>>>>>>>>> proposal, but I think this would be a very difficult area >>>>>>>>>>>>>>>>>>> to tackle across >>>>>>>>>>>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>>> penalty. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha >>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the >>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs in >>>>>>>>>>>>>>>>>>> Iceberg. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for >>>>>>>>>>>>>>>>>>> storing the versioned UDFs in Iceberg (inspired by view >>>>>>>>>>>>>>>>>>> spec). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to >>>>>>>>>>>>>>>>>>> views in that they are associated with tables, but they can >>>>>>>>>>>>>>>>>>> accept >>>>>>>>>>>>>>>>>>> arguments and produce return values, or even function as >>>>>>>>>>>>>>>>>>> inline expressions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, >>>>>>>>>>>>>>>>>>> Snowflake, Databricks Spark supports SQL UDFs at catalog >>>>>>>>>>>>>>>>>>> level [1]. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines. >>>>>>>>>>>>>>>>>>> Potentially engines can understand the UDFs written by >>>>>>>>>>>>>>>>>>> other engines (with >>>>>>>>>>>>>>>>>>> the translate layer). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature >>>>>>>>>>>>>>>>>>> into Iceberg would be a valuable addition, and we're eager >>>>>>>>>>>>>>>>>>> to collaborate >>>>>>>>>>>>>>>>>>> with the community to develop a UDF specification. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a >>>>>>>>>>>>>>>>>>> specification to propose to the community. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> -- >>>>>>>>>>>>>>>>>>> >> Ryan Blue >>>>>>>>>>>>>>>>>>> >> Databricks >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>> Databricks >>>>>>>>>>>>>>> >>>>>>>>>>>>>>