Thanks for the summary, Ajantha! Multi-statement UDFs are definitely useful, but whether those statements run within a single transaction should be treated as an engine-level concern. The Iceberg UDF spec can spell out the expectation, yet the actual guarantee still depends on the runtime. Even if a UDF declares itself transactional, the engine may or may not enforce it.
One more thing: should we also introduce a “secure UDF” option supported by some engines[1], so the body and any sensitive details stay hidden from callers? [1] https://docs.snowflake.com/en/developer-guide/secure-udf-procedure Yufei On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Thanks to everyone who joined the sync. > Here is the meeting recording: > https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing > Summary: > > - We have gone through the SQL UDF syntax supported by different > engines (Snowflake, databricks, Dremio, Trino, OSS spark 4.0). > - Each engine uses its own block separator, like $$ or '' or none. > Action item was to check whether engines support multi-statement > (transactional) UDF bodies. > - Discussed about function overloading. Need to check whether these > engines support function overloading for SQL UDFs. Postgres supports it! If > yes, need to adopt the spec to handle it. > - Started online spec review and discussed the deterministic flag and > concluded that we keep the independent fields (like deterministic) in spec > only if the majority of engines supports it. Else it will be passed in a > property bag (engine specific). And it is the engine's responsibility to > honor those optional properties. > > Feel free to review the current proposal document here > <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. > > Final spec will be put to review and vote once it is ready. > > Details for next Iceberg UDF sync: > > *Monday, June 30 · 9:00 – 10:00am*Time zone: America/Los_Angeles > Google Meet joining info > Video call link: https://meet.google.com/aui-czix-nbh > > - Ajantha > > On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > >> Thanks to everyone who joined the sync. >> Here is the meeting recording: >> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing >> >> Summary: >> >> - >> >> We discussed including Python support; the majority agreed *not to* >> (see recording for details). >> - >> >> No strong opposition to versioning — it will be included to support >> change tracking and similar use cases. >> - >> >> Suggestions were made to document how each catalog resolves UDFs, >> similar to views and tables. >> - >> >> We agreed not to deviate from the existing table/view spec — e.g., >> location will remain *required* for cross-catalog compatibility. >> - >> >> We also discussed a bit about view interoperability as the same >> things are applicable here. >> >> Feel free to review the proposal document >> >> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> >> here. >> With the current scope, it is similar to the view/table spec now. >> Final spec will be put to review and vote once it is ready. >> >> Details for next Iceberg UDF sync: >> >> *Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles >> Google Meet joining info >> Video call link: https://meet.google.com/aui-czix-nbh >> >> - Ajantha >> >> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> Hi folks, >>> >>> We’ve set up a dedicated bi-weekly community sync for the UDF project. >>> Everyone’s welcome to drop in and share ideas! Here is the meeting link: >>> >>> Iceberg UDF sync >>> Monday, June 2 · 9:00 – 10:00am >>> Time zone: America/Los_Angeles >>> Google Meet joining info >>> Video call link: https://meet.google.com/aui-czix-nbh >>> >>> Yufei >>> >>> >>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Update on the progress. >>>> >>>> I had a meeting today with Yufei and Yun.zou to discuss the UDF >>>> proposal. We covered several key points, though some are still open for >>>> further discussion: >>>> >>>> a) *UDF Versioning*: Do we truly need versioning for UDFs at this >>>> stage? We explored the possibility of simplifying the specification by >>>> avoiding view replication, and potentially introducing versioning support >>>> later. UDTFs, being a superset of views in some ways, may not require >>>> versioning initially. >>>> >>>> b) *VarArgs Support*: While some query engines may not support vararg >>>> syntax in CREATE FUNCTION, Iceberg UDFs could represent such arguments >>>> as lists when supported by the engine. >>>> >>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t support generic >>>> types (e.g., object), we can only map engine-specific types to Iceberg >>>> types. As a result, generic data types will not be supported in the initial >>>> version. >>>> >>>> d) *Python Support*: Incorporating Python as a language for SQL UDFs >>>> seems promising, especially given its potential to resolve interoperability >>>> challenges. Some engines, however, require platform version and package >>>> dependency details to execute Python code—this should be captured in the >>>> specification. >>>> >>>> *Next Steps* >>>> I will update the proposal document with two primary UDF use cases: >>>> >>>> - >>>> >>>> Policy exchange between engines >>>> - >>>> >>>> UDTF as a superset of view functionality >>>> >>>> The update will include corresponding syntax examples in both SQL and >>>> Python, and detail how each use case is represented in Iceberg metadata. >>>> >>>> We also plan to set up regular syncs (open to more interested >>>> participants) to continue refining and finalizing the UDF specification. >>>> - Ajantha >>>> >>>> >>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <ajanthab...@gmail.com> >>>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I've updated the design document[1] based on the previous comments. >>>>> Additionally, I've included the SQL UDF syntax supported by various >>>>> vendors, including Dremio, Snowflake, Databricks, and Trino. >>>>> >>>>> I'm happy to schedule a separate sync if a deeper discussion is >>>>> needed. Let's keep moving forward, especially with the renewed interest >>>>> from the community. >>>>> >>>>> [1] >>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>>> >>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey everyone, >>>>>> >>>>>> During the last catalog community sync, there was significant >>>>>> interest in storing UDFs in Iceberg and adding endpoints for UDF handling >>>>>> in the REST catalog spec. >>>>>> >>>>>> I recently discussed this with Yufei to better understand the new >>>>>> requirement of using UDFs for fine-grained access control policies. This >>>>>> expands the use cases beyond just versioned and interoperable UDFs. >>>>>> Additionally, I learnt that many vendors are interested in this feature. >>>>>> >>>>>> Given the strong community interest and support, I’d like to take >>>>>> ownership of this effort and revive the work. I'll be revisiting the >>>>>> document I proposed long back and will share an updated proposal by next >>>>>> week. >>>>>> >>>>>> Looking forward to storing UDFs in Iceberg! >>>>>> - Ajantha >>>>>> >>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>> >>>>>>> The UDF spec does not require representations to be SQL. It merely >>>>>>> does not specify (in this revision) how other representations are to be >>>>>>> written. >>>>>>> >>>>>>> This seems like an easy extension (adding a new type in the >>>>>>> "Representations" section). >>>>>>> >>>>>>> Cheers, >>>>>>> Dmitri. >>>>>>> >>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue <b...@databricks.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> Right now, SQL is an explicit requirement of the spec. It leaves a >>>>>>>> way for future versions to add different representations later, but >>>>>>>> only >>>>>>>> SQL is supported. That was also the feedback to my initial skepticism >>>>>>>> about >>>>>>>> how it would work to add functions. >>>>>>>> >>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>> >>>>>>>>> I do not think the spec is meant to allow only SQL >>>>>>>>> representations, although it is certainly faviouring SQL in >>>>>>>>> examples... It >>>>>>>>> would be nice to add a non-SQL example, indeed. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Dmitri. >>>>>>>>> >>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <fo...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Coming from PyIceberg, I have concerns as this proposal focuses >>>>>>>>>> on SQL-based engines, while Python-based systems often work with data >>>>>>>>>> frames. Adding imperative languages like Python would make this >>>>>>>>>> proposal >>>>>>>>>> more inclusive. >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> Fokko >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>>> In the design doc linked before in this thread [1] i read >>>>>>>>>>> "Without a common standard, the UDFs are hard to share among >>>>>>>>>>> different engines." >>>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>>> I agree with this statement. I don't fully understand yet how >>>>>>>>>>> the proposed design addresses shareability between the engines >>>>>>>>>>> though. >>>>>>>>>>> I would use some help to understand this better. >>>>>>>>>>> >>>>>>>>>>> Best >>>>>>>>>>> Piotr >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>>> >>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Piotr, what do you mean by making user-created functions >>>>>>>>>>>> shareable >>>>>>>>>>>> between engines? Do you mean UDFs written in imperative code? >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Hi, >>>>>>>>>>>> > >>>>>>>>>>>> > Thank you Ajantha for creating this thread. The Iceberg UDFs >>>>>>>>>>>> are an interesting idea! >>>>>>>>>>>> > Is there a plan to make the user-created functions sharable >>>>>>>>>>>> between the engines? >>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look like in >>>>>>>>>>>> e..g Spark or Trino? >>>>>>>>>>>> > >>>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>>> > >>>>>>>>>>>> > Best >>>>>>>>>>>> > Piotr >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>> >> >>>>>>>>>>>> >> I just looked through the proposal and added comments. I >>>>>>>>>>>> think it would be helpful to also have a design doc that covers >>>>>>>>>>>> the choices >>>>>>>>>>>> from the draft spec. For instance, the choice to enumerate all >>>>>>>>>>>> possible >>>>>>>>>>>> function input struts rather than allowing generics and varargs. >>>>>>>>>>>> >> >>>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>>> >> >>>>>>>>>>>> >> I think that the choice to enumerate function signatures is >>>>>>>>>>>> limiting. It would be nice to see a discussion of the trade-offs >>>>>>>>>>>> and a >>>>>>>>>>>> rationale for the choice. I think it would also be very helpful to >>>>>>>>>>>> have a >>>>>>>>>>>> few representative use cases for this included in the doc. That >>>>>>>>>>>> way the >>>>>>>>>>>> proposal can demonstrate that it solves those use cases with >>>>>>>>>>>> reasonable >>>>>>>>>>>> trade-offs. >>>>>>>>>>>> >> There are a few instances where this is inconsistent with >>>>>>>>>>>> conventions in other specs. For example, using string IDs rather >>>>>>>>>>>> than an >>>>>>>>>>>> integer. >>>>>>>>>>>> >> This uses a very different model for spec versioning than >>>>>>>>>>>> the Iceberg view and table specs. It requires readers to fail if >>>>>>>>>>>> there are >>>>>>>>>>>> any unknown fields, which prevents the spec from adding things >>>>>>>>>>>> that are >>>>>>>>>>>> fully backward-compatible. Other Iceberg specs only require a >>>>>>>>>>>> version >>>>>>>>>>>> change to introduce forward-incompatible changes and I think that >>>>>>>>>>>> this >>>>>>>>>>>> should do the same to avoid confusion. >>>>>>>>>>>> >> It looks like the intent is to allow multiple function >>>>>>>>>>>> signatures per verison, but it is unclear how to encode them >>>>>>>>>>>> because a >>>>>>>>>>>> version is associated with a single function signature. >>>>>>>>>>>> >> There is no review of SQL syntax for creating functions >>>>>>>>>>>> across engines, so this doesn’t show that the metadata proposed is >>>>>>>>>>>> sufficient for cross-engine use cases. >>>>>>>>>>>> >> The example for a table-valued function shows a SELECT >>>>>>>>>>>> statement and it isn’t clear how this is distinct from a view >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>>> >>> I will wait for a week and If no more review comments, I >>>>>>>>>>>> will raise a PR for spec addition next week. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> If anyone else is interested, please have a look at the >>>>>>>>>>>> proposal >>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> - Ajantha >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa < >>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> I have left some comments. It is an interesting direction, >>>>>>>>>>>> but there might be some details that need to be fine tuned. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> The doc is here [1] for others who might be interested. >>>>>>>>>>>> Resharing since I do not think it was directly linked in the >>>>>>>>>>>> thread. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> [1] >>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>> >>>> Walaa. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any review >>>>>>>>>>>> on the proposal. >>>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> We've only received one review so far (from Benny). >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal. >>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions and >>>>>>>>>>>> how we want to implement it. >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa < >>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant scalar/aggregate/table >>>>>>>>>>>> user defined functions. Here are some examples of what I meant in >>>>>>>>>>>> (2): >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation of (1) >>>>>>>>>>>> where the API is data flow/data pipeline API instead of SQL (e.g., >>>>>>>>>>>> Spark >>>>>>>>>>>> Scala). Yes, that is also possible in the very long run :) >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative function >>>>>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> I think we could still explore some long term >>>>>>>>>>>> opportunities in this case. Consider you register a Spark temp >>>>>>>>>>>> view as some >>>>>>>>>>>> sort of data frame read, then it could still be resolved to a >>>>>>>>>>>> Spark plan >>>>>>>>>>>> that is representable by an intermediate representation. But I >>>>>>>>>>>> agree this >>>>>>>>>>>> gets very complicated very soon, and just having the case (1) >>>>>>>>>>>> covered would >>>>>>>>>>>> already be a huge step forward. >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow < >>>>>>>>>>>> btc...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL UDF can >>>>>>>>>>>> be used to build a parameterized view. So, there's definitely a >>>>>>>>>>>> lot in >>>>>>>>>>>> common between UDFs and views. >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa >>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is >>>>>>>>>>>> perceived as a "UDF". There are 2 flavors: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user whose >>>>>>>>>>>> definition is a composition of other built-in functions/SQL >>>>>>>>>>>> expressions. >>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative function >>>>>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are pretty >>>>>>>>>>>> much from (1) and I think those have more analogy to views due to >>>>>>>>>>>> their SQL >>>>>>>>>>>> nature. Agree (2) is not practical to maintain by Iceberg, but I >>>>>>>>>>>> think >>>>>>>>>>>> Ajantha's use cases are around (1), and may be worth evaluating. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>>> proposal, but I think this would be a very difficult area to >>>>>>>>>>>> tackle across >>>>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>>>> performance >>>>>>>>>>>> penalty. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL >>>>>>>>>>>> representations of UDFs (similar to views as shared by the >>>>>>>>>>>> reference links >>>>>>>>>>>> above), the complexity involved will be similar to managing views. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input. >>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec >>>>>>>>>>>> (inspired by the view spec) this week to facilitate further >>>>>>>>>>>> discussions. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye < >>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common set of >>>>>>>>>>>> functions across engines, I don't see how that is practical when >>>>>>>>>>>> those >>>>>>>>>>>> engines are implemented so differently. Plugging in code -- and >>>>>>>>>>>> especially >>>>>>>>>>>> custom user-supplied code -- seems inherently specialized to me >>>>>>>>>>>> and should >>>>>>>>>>>> be part of the engines' design. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I feel we >>>>>>>>>>>> can say exactly the same thing for Iceberg views, but yet we have >>>>>>>>>>>> Iceberg >>>>>>>>>>>> multi-dialect views implemented. Maybe it sounds like we are >>>>>>>>>>>> trying to draw >>>>>>>>>>>> a line between SQL vs other programming language as "code"? but I >>>>>>>>>>>> think SQL >>>>>>>>>>>> is just another type of code, and we are already talking about >>>>>>>>>>>> compiling >>>>>>>>>>>> all these different code dialects to an intermediate >>>>>>>>>>>> representation (using >>>>>>>>>>>> projects like Coral, Substrait), which will be stored as another >>>>>>>>>>>> type of >>>>>>>>>>>> representation of Iceberg view. I think the same functionality can >>>>>>>>>>>> be used >>>>>>>>>>>> for UDFs if developed. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a good >>>>>>>>>>>> idea, even just a multi-dialect one like view, and that can allow >>>>>>>>>>>> engines >>>>>>>>>>>> to for example parse a view SQL, and when a function referenced >>>>>>>>>>>> cannot be >>>>>>>>>>>> resolved, try to seek for a multi-dialect UDF definition. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have the >>>>>>>>>>>> actual proposal published. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp < >>>>>>>>>>>> sn...@snazy.de> wrote: >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable and >>>>>>>>>>>> "non-centralized" as views are. The same performance concerns >>>>>>>>>>>> apply to >>>>>>>>>>>> views as well. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon which >>>>>>>>>>>> engines can build, so the argument that UDFs aren't practical, >>>>>>>>>>>> because >>>>>>>>>>>> engines are different, is probably only a temporary concern. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try to >>>>>>>>>>>> tackle the idea to make views portable, which is conceptually not >>>>>>>>>>>> that much >>>>>>>>>>>> different from portable UDFs. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative touch to >>>>>>>>>>>> the idea of having UDFs in Iceberg, especially not in this early >>>>>>>>>>>> stage. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good idea to >>>>>>>>>>>> add UDFs tracked by Iceberg catalogs. I think that Iceberg >>>>>>>>>>>> primarily deals >>>>>>>>>>>> with things that are centralized, like tables of data. While it >>>>>>>>>>>> would be >>>>>>>>>>>> great to have a common set of functions across engines, I don't >>>>>>>>>>>> see how >>>>>>>>>>>> that is practical when those engines are implemented so >>>>>>>>>>>> differently. >>>>>>>>>>>> Plugging in code -- and especially custom user-supplied code -- >>>>>>>>>>>> seems >>>>>>>>>>>> inherently specialized to me and should be part of the engines' >>>>>>>>>>>> design. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>>> proposal, but I think this would be a very difficult area to >>>>>>>>>>>> tackle across >>>>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>>>> performance >>>>>>>>>>>> penalty. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the community >>>>>>>>>>>> interest in storing the Versioned SQL UDFs in Iceberg. >>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for >>>>>>>>>>>> storing the versioned UDFs in Iceberg (inspired by view spec). >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to views in >>>>>>>>>>>> that they are associated with tables, but they can accept >>>>>>>>>>>> arguments and >>>>>>>>>>>> produce return values, or even function as inline expressions. >>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, >>>>>>>>>>>> Snowflake, Databricks Spark supports SQL UDFs at catalog level [1]. >>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable >>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines. >>>>>>>>>>>> Potentially engines can understand the UDFs written by other >>>>>>>>>>>> engines (with >>>>>>>>>>>> the translate layer). >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature into >>>>>>>>>>>> Iceberg would be a valuable addition, and we're eager to >>>>>>>>>>>> collaborate with >>>>>>>>>>>> the community to develop a UDF specification. >>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a >>>>>>>>>>>> specification to propose to the community. >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> -- >>>>>>>>>>>> >> Ryan Blue >>>>>>>>>>>> >> Databricks >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Databricks >>>>>>>> >>>>>>>