Thanks to everyone who joined the sync. Here is the meeting recording: https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing Summary:
- We have gone through the SQL UDF syntax supported by different engines (Snowflake, databricks, Dremio, Trino, OSS spark 4.0). - Each engine uses its own block separator, like $$ or '' or none. Action item was to check whether engines support multi-statement (transactional) UDF bodies. - Discussed about function overloading. Need to check whether these engines support function overloading for SQL UDFs. Postgres supports it! If yes, need to adopt the spec to handle it. - Started online spec review and discussed the deterministic flag and concluded that we keep the independent fields (like deterministic) in spec only if the majority of engines supports it. Else it will be passed in a property bag (engine specific). And it is the engine's responsibility to honor those optional properties. Feel free to review the current proposal document here <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. Final spec will be put to review and vote once it is ready. Details for next Iceberg UDF sync: *Monday, June 30 · 9:00 – 10:00am*Time zone: America/Los_Angeles Google Meet joining info Video call link: https://meet.google.com/aui-czix-nbh - Ajantha On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Thanks to everyone who joined the sync. > Here is the meeting recording: > https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing > > Summary: > > - > > We discussed including Python support; the majority agreed *not to* > (see recording for details). > - > > No strong opposition to versioning — it will be included to support > change tracking and similar use cases. > - > > Suggestions were made to document how each catalog resolves UDFs, > similar to views and tables. > - > > We agreed not to deviate from the existing table/view spec — e.g., > location will remain *required* for cross-catalog compatibility. > - > > We also discussed a bit about view interoperability as the same things > are applicable here. > > Feel free to review the proposal document > > <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> > here. > With the current scope, it is similar to the view/table spec now. > Final spec will be put to review and vote once it is ready. > > Details for next Iceberg UDF sync: > > *Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles > Google Meet joining info > Video call link: https://meet.google.com/aui-czix-nbh > > - Ajantha > > On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com> wrote: > >> Hi folks, >> >> We’ve set up a dedicated bi-weekly community sync for the UDF project. >> Everyone’s welcome to drop in and share ideas! Here is the meeting link: >> >> Iceberg UDF sync >> Monday, June 2 · 9:00 – 10:00am >> Time zone: America/Los_Angeles >> Google Meet joining info >> Video call link: https://meet.google.com/aui-czix-nbh >> >> Yufei >> >> >> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Update on the progress. >>> >>> I had a meeting today with Yufei and Yun.zou to discuss the UDF >>> proposal. We covered several key points, though some are still open for >>> further discussion: >>> >>> a) *UDF Versioning*: Do we truly need versioning for UDFs at this >>> stage? We explored the possibility of simplifying the specification by >>> avoiding view replication, and potentially introducing versioning support >>> later. UDTFs, being a superset of views in some ways, may not require >>> versioning initially. >>> >>> b) *VarArgs Support*: While some query engines may not support vararg >>> syntax in CREATE FUNCTION, Iceberg UDFs could represent such arguments >>> as lists when supported by the engine. >>> >>> c) *Generics in UDFs*: Since Iceberg currently doesn’t support generic >>> types (e.g., object), we can only map engine-specific types to Iceberg >>> types. As a result, generic data types will not be supported in the initial >>> version. >>> >>> d) *Python Support*: Incorporating Python as a language for SQL UDFs >>> seems promising, especially given its potential to resolve interoperability >>> challenges. Some engines, however, require platform version and package >>> dependency details to execute Python code—this should be captured in the >>> specification. >>> >>> *Next Steps* >>> I will update the proposal document with two primary UDF use cases: >>> >>> - >>> >>> Policy exchange between engines >>> - >>> >>> UDTF as a superset of view functionality >>> >>> The update will include corresponding syntax examples in both SQL and >>> Python, and detail how each use case is represented in Iceberg metadata. >>> >>> We also plan to set up regular syncs (open to more interested >>> participants) to continue refining and finalizing the UDF specification. >>> - Ajantha >>> >>> >>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Hi everyone, >>>> >>>> I've updated the design document[1] based on the previous comments. >>>> Additionally, I've included the SQL UDF syntax supported by various >>>> vendors, including Dremio, Snowflake, Databricks, and Trino. >>>> >>>> I'm happy to schedule a separate sync if a deeper discussion is needed. >>>> Let's keep moving forward, especially with the renewed interest from the >>>> community. >>>> >>>> [1] >>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>> >>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <ajanthab...@gmail.com> >>>> wrote: >>>> >>>>> Hey everyone, >>>>> >>>>> During the last catalog community sync, there was significant interest >>>>> in storing UDFs in Iceberg and adding endpoints for UDF handling in the >>>>> REST catalog spec. >>>>> >>>>> I recently discussed this with Yufei to better understand the new >>>>> requirement of using UDFs for fine-grained access control policies. This >>>>> expands the use cases beyond just versioned and interoperable UDFs. >>>>> Additionally, I learnt that many vendors are interested in this feature. >>>>> >>>>> Given the strong community interest and support, I’d like to take >>>>> ownership of this effort and revive the work. I'll be revisiting the >>>>> document I proposed long back and will share an updated proposal by next >>>>> week. >>>>> >>>>> Looking forward to storing UDFs in Iceberg! >>>>> - Ajantha >>>>> >>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>> >>>>>> The UDF spec does not require representations to be SQL. It merely >>>>>> does not specify (in this revision) how other representations are to be >>>>>> written. >>>>>> >>>>>> This seems like an easy extension (adding a new type in the >>>>>> "Representations" section). >>>>>> >>>>>> Cheers, >>>>>> Dmitri. >>>>>> >>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue <b...@databricks.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> Right now, SQL is an explicit requirement of the spec. It leaves a >>>>>>> way for future versions to add different representations later, but only >>>>>>> SQL is supported. That was also the feedback to my initial skepticism >>>>>>> about >>>>>>> how it would work to add functions. >>>>>>> >>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>> >>>>>>>> I do not think the spec is meant to allow only SQL representations, >>>>>>>> although it is certainly faviouring SQL in examples... It would be >>>>>>>> nice to >>>>>>>> add a non-SQL example, indeed. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Dmitri. >>>>>>>> >>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <fo...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Coming from PyIceberg, I have concerns as this proposal focuses on >>>>>>>>> SQL-based engines, while Python-based systems often work with data >>>>>>>>> frames. >>>>>>>>> Adding imperative languages like Python would make this proposal more >>>>>>>>> inclusive. >>>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> Fokko >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>> In the design doc linked before in this thread [1] i read >>>>>>>>>> "Without a common standard, the UDFs are hard to share among >>>>>>>>>> different engines." >>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>> I agree with this statement. I don't fully understand yet how the >>>>>>>>>> proposed design addresses shareability between the engines though. >>>>>>>>>> I would use some help to understand this better. >>>>>>>>>> >>>>>>>>>> Best >>>>>>>>>> Piotr >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>> >>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Piotr, what do you mean by making user-created functions >>>>>>>>>>> shareable >>>>>>>>>>> between engines? Do you mean UDFs written in imperative code? >>>>>>>>>>> >>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Hi, >>>>>>>>>>> > >>>>>>>>>>> > Thank you Ajantha for creating this thread. The Iceberg UDFs >>>>>>>>>>> are an interesting idea! >>>>>>>>>>> > Is there a plan to make the user-created functions sharable >>>>>>>>>>> between the engines? >>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look like in e..g >>>>>>>>>>> Spark or Trino? >>>>>>>>>>> > >>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>> > >>>>>>>>>>> > Best >>>>>>>>>>> > Piotr >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> I just looked through the proposal and added comments. I >>>>>>>>>>> think it would be helpful to also have a design doc that covers the >>>>>>>>>>> choices >>>>>>>>>>> from the draft spec. For instance, the choice to enumerate all >>>>>>>>>>> possible >>>>>>>>>>> function input struts rather than allowing generics and varargs. >>>>>>>>>>> >> >>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>> >> >>>>>>>>>>> >> I think that the choice to enumerate function signatures is >>>>>>>>>>> limiting. It would be nice to see a discussion of the trade-offs >>>>>>>>>>> and a >>>>>>>>>>> rationale for the choice. I think it would also be very helpful to >>>>>>>>>>> have a >>>>>>>>>>> few representative use cases for this included in the doc. That way >>>>>>>>>>> the >>>>>>>>>>> proposal can demonstrate that it solves those use cases with >>>>>>>>>>> reasonable >>>>>>>>>>> trade-offs. >>>>>>>>>>> >> There are a few instances where this is inconsistent with >>>>>>>>>>> conventions in other specs. For example, using string IDs rather >>>>>>>>>>> than an >>>>>>>>>>> integer. >>>>>>>>>>> >> This uses a very different model for spec versioning than the >>>>>>>>>>> Iceberg view and table specs. It requires readers to fail if there >>>>>>>>>>> are any >>>>>>>>>>> unknown fields, which prevents the spec from adding things that are >>>>>>>>>>> fully >>>>>>>>>>> backward-compatible. Other Iceberg specs only require a version >>>>>>>>>>> change to >>>>>>>>>>> introduce forward-incompatible changes and I think that this should >>>>>>>>>>> do the >>>>>>>>>>> same to avoid confusion. >>>>>>>>>>> >> It looks like the intent is to allow multiple function >>>>>>>>>>> signatures per verison, but it is unclear how to encode them >>>>>>>>>>> because a >>>>>>>>>>> version is associated with a single function signature. >>>>>>>>>>> >> There is no review of SQL syntax for creating functions >>>>>>>>>>> across engines, so this doesn’t show that the metadata proposed is >>>>>>>>>>> sufficient for cross-engine use cases. >>>>>>>>>>> >> The example for a table-valued function shows a SELECT >>>>>>>>>>> statement and it isn’t clear how this is distinct from a view >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>>>> >>> >>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>> >>> I will wait for a week and If no more review comments, I >>>>>>>>>>> will raise a PR for spec addition next week. >>>>>>>>>>> >>> >>>>>>>>>>> >>> If anyone else is interested, please have a look at the >>>>>>>>>>> proposal >>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>> >>> >>>>>>>>>>> >>> - Ajantha >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa < >>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> I have left some comments. It is an interesting direction, >>>>>>>>>>> but there might be some details that need to be fine tuned. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> The doc is here [1] for others who might be interested. >>>>>>>>>>> Resharing since I do not think it was directly linked in the thread. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> [1] >>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Thanks, >>>>>>>>>>> >>>> Walaa. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any review >>>>>>>>>>> on the proposal. >>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> We've only received one review so far (from Benny). >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal. >>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions and how >>>>>>>>>>> we want to implement it. >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa < >>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant scalar/aggregate/table >>>>>>>>>>> user defined functions. Here are some examples of what I meant in >>>>>>>>>>> (2): >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation of (1) >>>>>>>>>>> where the API is data flow/data pipeline API instead of SQL (e.g., >>>>>>>>>>> Spark >>>>>>>>>>> Scala). Yes, that is also possible in the very long run :) >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative function >>>>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> I think we could still explore some long term >>>>>>>>>>> opportunities in this case. Consider you register a Spark temp view >>>>>>>>>>> as some >>>>>>>>>>> sort of data frame read, then it could still be resolved to a Spark >>>>>>>>>>> plan >>>>>>>>>>> that is representable by an intermediate representation. But I >>>>>>>>>>> agree this >>>>>>>>>>> gets very complicated very soon, and just having the case (1) >>>>>>>>>>> covered would >>>>>>>>>>> already be a huge step forward. >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow < >>>>>>>>>>> btc...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL UDF can >>>>>>>>>>> be used to build a parameterized view. So, there's definitely a >>>>>>>>>>> lot in >>>>>>>>>>> common between UDFs and views. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa < >>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is >>>>>>>>>>> perceived as a "UDF". There are 2 flavors: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user whose >>>>>>>>>>> definition is a composition of other built-in functions/SQL >>>>>>>>>>> expressions. >>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative function >>>>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are pretty >>>>>>>>>>> much from (1) and I think those have more analogy to views due to >>>>>>>>>>> their SQL >>>>>>>>>>> nature. Agree (2) is not practical to maintain by Iceberg, but I >>>>>>>>>>> think >>>>>>>>>>> Ajantha's use cases are around (1), and may be worth evaluating. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>> proposal, but I think this would be a very difficult area to tackle >>>>>>>>>>> across >>>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>>> performance >>>>>>>>>>> penalty. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL >>>>>>>>>>> representations of UDFs (similar to views as shared by the >>>>>>>>>>> reference links >>>>>>>>>>> above), the complexity involved will be similar to managing views. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input. >>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec (inspired >>>>>>>>>>> by the view spec) this week to facilitate further discussions. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye < >>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common set of >>>>>>>>>>> functions across engines, I don't see how that is practical when >>>>>>>>>>> those >>>>>>>>>>> engines are implemented so differently. Plugging in code -- and >>>>>>>>>>> especially >>>>>>>>>>> custom user-supplied code -- seems inherently specialized to me and >>>>>>>>>>> should >>>>>>>>>>> be part of the engines' design. >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I feel we >>>>>>>>>>> can say exactly the same thing for Iceberg views, but yet we have >>>>>>>>>>> Iceberg >>>>>>>>>>> multi-dialect views implemented. Maybe it sounds like we are trying >>>>>>>>>>> to draw >>>>>>>>>>> a line between SQL vs other programming language as "code"? but I >>>>>>>>>>> think SQL >>>>>>>>>>> is just another type of code, and we are already talking about >>>>>>>>>>> compiling >>>>>>>>>>> all these different code dialects to an intermediate representation >>>>>>>>>>> (using >>>>>>>>>>> projects like Coral, Substrait), which will be stored as another >>>>>>>>>>> type of >>>>>>>>>>> representation of Iceberg view. I think the same functionality can >>>>>>>>>>> be used >>>>>>>>>>> for UDFs if developed. >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a good idea, >>>>>>>>>>> even just a multi-dialect one like view, and that can allow engines >>>>>>>>>>> to for >>>>>>>>>>> example parse a view SQL, and when a function referenced cannot be >>>>>>>>>>> resolved, try to seek for a multi-dialect UDF definition. >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have the >>>>>>>>>>> actual proposal published. >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp < >>>>>>>>>>> sn...@snazy.de> wrote: >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable and >>>>>>>>>>> "non-centralized" as views are. The same performance concerns apply >>>>>>>>>>> to >>>>>>>>>>> views as well. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon which >>>>>>>>>>> engines can build, so the argument that UDFs aren't practical, >>>>>>>>>>> because >>>>>>>>>>> engines are different, is probably only a temporary concern. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try to >>>>>>>>>>> tackle the idea to make views portable, which is conceptually not >>>>>>>>>>> that much >>>>>>>>>>> different from portable UDFs. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative touch to >>>>>>>>>>> the idea of having UDFs in Iceberg, especially not in this early >>>>>>>>>>> stage. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good idea to >>>>>>>>>>> add UDFs tracked by Iceberg catalogs. I think that Iceberg >>>>>>>>>>> primarily deals >>>>>>>>>>> with things that are centralized, like tables of data. While it >>>>>>>>>>> would be >>>>>>>>>>> great to have a common set of functions across engines, I don't see >>>>>>>>>>> how >>>>>>>>>>> that is practical when those engines are implemented so differently. >>>>>>>>>>> Plugging in code -- and especially custom user-supplied code -- >>>>>>>>>>> seems >>>>>>>>>>> inherently specialized to me and should be part of the engines' >>>>>>>>>>> design. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>> proposal, but I think this would be a very difficult area to tackle >>>>>>>>>>> across >>>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>>> performance >>>>>>>>>>> penalty. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the community >>>>>>>>>>> interest in storing the Versioned SQL UDFs in Iceberg. >>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for storing >>>>>>>>>>> the versioned UDFs in Iceberg (inspired by view spec). >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to views in >>>>>>>>>>> that they are associated with tables, but they can accept arguments >>>>>>>>>>> and >>>>>>>>>>> produce return values, or even function as inline expressions. >>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, >>>>>>>>>>> Snowflake, Databricks Spark supports SQL UDFs at catalog level [1]. >>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable >>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines. >>>>>>>>>>> Potentially engines can understand the UDFs written by other >>>>>>>>>>> engines (with >>>>>>>>>>> the translate layer). >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature into >>>>>>>>>>> Iceberg would be a valuable addition, and we're eager to >>>>>>>>>>> collaborate with >>>>>>>>>>> the community to develop a UDF specification. >>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a >>>>>>>>>>> specification to propose to the community. >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> -- >>>>>>>>>>> >> Ryan Blue >>>>>>>>>>> >> Databricks >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Databricks >>>>>>> >>>>>>