I’d propose to move the field `properties` from a top level field to a field inside “version” along with a representation, so that properties are versioned. A property like “deterministic” could change along with representation over time. For example, we need to change “deterministic” from true to false in case of adding a non-deterministic SQL expression/function(e.g., now()) inside an UDF. Otherwise, rollback won't be safe.
That said, it's still an open question whether we need any non-versioned properties. We can introduce them later if a use case arises. Yufei On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com> wrote: > Thanks for the summary, Ajantha! > > I’d prefer to keep the signature list separate from the representation > history. Here are reasons: > > 1. Each version still enforces a single signature. Although the > signatures array is global to the UDF, each version references just one > signature ID. Rollbacks to historical versions remain safe. > 2. We’ve separated the less frequently changing component (signatures) > from the more dynamic one (representations) to reduce metadata file size. > 3. Since signatures use Iceberg data types, they should remain > unaffected by multi-dialect representation differences. > > Yufei > > > On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> Thanks to everyone who joined the sync. >> Here is the meeting recording: >> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing >> >> Summary: >> We have discussed the action items from the last sync (*see Appendix C* in >> the proposal doc) >> >> - Function overloading: Supported by few of the engines and in the >> roadmaps of many engines. Iceberg will support it. We will maintain the >> `FunctionIdentifier` (extends `TableIdentifer` but also have a member >> containing the function argument's type list). And all operations like >> load, rename, list, create and drop are based on `FunctionIdentifier`. >> - Secure UDF: If we store it as a property in a bag, we need to >> standardize the property name. Iceberg encryption may be orthogonal to >> this >> discussion. >> - UDF with multi statement and procedural bodies are supported by >> some engines. Iceberg will support it. Store the body as it is while >> creating function by the engine. >> >> new discussions around >> >> - Standardizing the property names (deterministic, secure). >> - About the rename function. >> - Replace function. To check upto what level replace is supported >> (considering function overloading) . >> - Signature should be associated with representation? >> >> I think we are close on the spec. Please review the proposal >> >> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing> >> . >> >> Details for next Iceberg UDF sync: >> >> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles >> Google Meet joining info >> Video call link: https://meet.google.com/aui-czix-nbh >> >> - Ajantha >> >> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Can it be handled by Iceberg encryption? If the whole metadata is >>> encrypted, we don't have to worry about just hiding the UDF body? Let us >>> discuss more on the sync today. >>> >>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> Yes, hiding the definition and disabling pushdown are required.We will >>>> need a named key(e.g., secure) somewhere, no matter if it is a top level >>>> property or a key as a part of the UDF properties. So that both UDF creator >>>> and consumer can recognize it. >>>> >>>> Yufei >>>> >>>> >>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> wrote: >>>> >>>>> Thanks for the extra detail. What do you think the spec would require? >>>>> Would it require hiding the UDF definition from users and require specific >>>>> pushdown cases be disabled? The use cases seem valid, but I'm trying to >>>>> understand the requirements this places on engines and why it needs to be >>>>> part of the spec, rather than part of the properties of the UDF. >>>>> >>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com> wrote: >>>>> >>>>>> Hi Ryan, >>>>>> >>>>>> Here are the main use cases for secure UDFs: >>>>>> >>>>>> 1. >>>>>> >>>>>> Hiding UDF Definitions: This includes concealing the UDF body and >>>>>> details like the list of imports, some of them aren’t applicable to >>>>>> SQL >>>>>> UDFs. >>>>>> 2. >>>>>> >>>>>> Sandboxed Execution: Ensuring the UDF runs in an isolated >>>>>> environment. Again, this typically doesn’t apply to SQL UDFs. >>>>>> 3. >>>>>> >>>>>> Preventing Data Leakage at Execution Time: For example, secure >>>>>> UDFs may disable certain optimizations—such as predicate pushdown—to >>>>>> avoid >>>>>> exposing sensitive data indirectly. [1] >>>>>> >>>>>> Given these scenarios, I agree with your point that the secure flag >>>>>> is primarily an instruction to the engine to behave differently. While >>>>>> it's >>>>>> largely an engine-side behavior, we still need to include this flag in >>>>>> the >>>>>> UDF definition to indicate whether a UDF is secure, especially >>>>>> considering >>>>>> the perf penalty introduced by scenario #3. We should clearly recommend >>>>>> that users avoid marking UDFs as secure unless it's truly necessary. >>>>>> >>>>>> [1] >>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown >>>>>> Yufei >>>>>> >>>>>> >>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com> wrote: >>>>>> >>>>>>> Yufei, could you make the argument for supporting a "secure" UDF? >>>>>>> What use case are you addressing and what specifically changes about how >>>>>>> the UDF is handled? If the idea is to hide the UDF definition, do we >>>>>>> need >>>>>>> to include it? >>>>>>> >>>>>>> I think this would be a signal to a "trusted engine". When the >>>>>>> engine interacts with the catalog it sends authorization information >>>>>>> about >>>>>>> itself in addition to the user that it is acting on behalf of. That way >>>>>>> the >>>>>>> catalog knows that the secure UDF can be sent to the engine and won't be >>>>>>> shown to the user. The majority of this logic is on the REST server >>>>>>> side, >>>>>>> and the only part that is communicated to the client is the request not >>>>>>> to >>>>>>> show the UDF to the user, right? In that case should this be a property >>>>>>> rather than part of the definition? Even if we state that the client >>>>>>> "must" >>>>>>> suppress the UDF definition, it's really just a request. Only trusted >>>>>>> engines can be passed the UDF definition, so a spec requirement to >>>>>>> suppress >>>>>>> the definition isn't very meaningful. >>>>>>> >>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <flyrain...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for the summary, Ajantha! >>>>>>>> >>>>>>>> Multi-statement UDFs are definitely useful, but whether those >>>>>>>> statements run within a single transaction should be treated as an >>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the >>>>>>>> expectation, >>>>>>>> yet the actual guarantee still depends on the runtime. Even if a UDF >>>>>>>> declares itself transactional, the engine may or may not enforce it. >>>>>>>> >>>>>>>> One more thing: should we also introduce a “secure UDF” option >>>>>>>> supported by some engines[1], so the body and any sensitive details >>>>>>>> stay >>>>>>>> hidden from callers? >>>>>>>> >>>>>>>> [1] >>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure >>>>>>>> >>>>>>>> Yufei >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat < >>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>> Here is the meeting recording: >>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing >>>>>>>>> Summary: >>>>>>>>> >>>>>>>>> - We have gone through the SQL UDF syntax supported by >>>>>>>>> different engines (Snowflake, databricks, Dremio, Trino, OSS spark >>>>>>>>> 4.0). >>>>>>>>> - Each engine uses its own block separator, like $$ or '' or >>>>>>>>> none. Action item was to check whether engines support >>>>>>>>> multi-statement >>>>>>>>> (transactional) UDF bodies. >>>>>>>>> - Discussed about function overloading. Need to check whether >>>>>>>>> these engines support function overloading for SQL UDFs. Postgres >>>>>>>>> supports >>>>>>>>> it! If yes, need to adopt the spec to handle it. >>>>>>>>> - Started online spec review and discussed the deterministic >>>>>>>>> flag and concluded that we keep the independent fields (like >>>>>>>>> deterministic) >>>>>>>>> in spec only if the majority of engines supports it. Else it will >>>>>>>>> be passed >>>>>>>>> in a property bag (engine specific). And it is the engine's >>>>>>>>> responsibility to honor those optional properties. >>>>>>>>> >>>>>>>>> Feel free to review the current proposal document here >>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. >>>>>>>>> >>>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>>> >>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>> >>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>>>> Google Meet joining info >>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>> >>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>> Here is the meeting recording: >>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing >>>>>>>>>> >>>>>>>>>> Summary: >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> We discussed including Python support; the majority agreed *not >>>>>>>>>> to* (see recording for details). >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> No strong opposition to versioning — it will be included to >>>>>>>>>> support change tracking and similar use cases. >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> Suggestions were made to document how each catalog resolves >>>>>>>>>> UDFs, similar to views and tables. >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> We agreed not to deviate from the existing table/view spec — >>>>>>>>>> e.g., location will remain *required* for cross-catalog >>>>>>>>>> compatibility. >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> We also discussed a bit about view interoperability as the >>>>>>>>>> same things are applicable here. >>>>>>>>>> >>>>>>>>>> Feel free to review the proposal document >>>>>>>>>> >>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> >>>>>>>>>> here. >>>>>>>>>> With the current scope, it is similar to the view/table spec now. >>>>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>>>> >>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>> >>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>>>>> Google Meet joining info >>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>> >>>>>>>>>> - Ajantha >>>>>>>>>> >>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi folks, >>>>>>>>>>> >>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the UDF >>>>>>>>>>> project. Everyone’s welcome to drop in and share ideas! Here is the >>>>>>>>>>> meeting >>>>>>>>>>> link: >>>>>>>>>>> >>>>>>>>>>> Iceberg UDF sync >>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am >>>>>>>>>>> Time zone: America/Los_Angeles >>>>>>>>>>> Google Meet joining info >>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>> >>>>>>>>>>> Yufei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Update on the progress. >>>>>>>>>>>> >>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss the UDF >>>>>>>>>>>> proposal. We covered several key points, though some are still >>>>>>>>>>>> open for >>>>>>>>>>>> further discussion: >>>>>>>>>>>> >>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for UDFs at >>>>>>>>>>>> this stage? We explored the possibility of simplifying the >>>>>>>>>>>> specification by >>>>>>>>>>>> avoiding view replication, and potentially introducing versioning >>>>>>>>>>>> support >>>>>>>>>>>> later. UDTFs, being a superset of views in some ways, may not >>>>>>>>>>>> require >>>>>>>>>>>> versioning initially. >>>>>>>>>>>> >>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not support >>>>>>>>>>>> vararg syntax in CREATE FUNCTION, Iceberg UDFs could represent >>>>>>>>>>>> such arguments as lists when supported by the engine. >>>>>>>>>>>> >>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t support >>>>>>>>>>>> generic types (e.g., object), we can only map engine-specific >>>>>>>>>>>> types to Iceberg types. As a result, generic data types will not be >>>>>>>>>>>> supported in the initial version. >>>>>>>>>>>> >>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language for >>>>>>>>>>>> SQL UDFs seems promising, especially given its potential to resolve >>>>>>>>>>>> interoperability challenges. Some engines, however, require >>>>>>>>>>>> platform >>>>>>>>>>>> version and package dependency details to execute Python code—this >>>>>>>>>>>> should >>>>>>>>>>>> be captured in the specification. >>>>>>>>>>>> >>>>>>>>>>>> *Next Steps* >>>>>>>>>>>> I will update the proposal document with two primary UDF use >>>>>>>>>>>> cases: >>>>>>>>>>>> >>>>>>>>>>>> - >>>>>>>>>>>> >>>>>>>>>>>> Policy exchange between engines >>>>>>>>>>>> - >>>>>>>>>>>> >>>>>>>>>>>> UDTF as a superset of view functionality >>>>>>>>>>>> >>>>>>>>>>>> The update will include corresponding syntax examples in both >>>>>>>>>>>> SQL and Python, and detail how each use case is represented in >>>>>>>>>>>> Iceberg >>>>>>>>>>>> metadata. >>>>>>>>>>>> >>>>>>>>>>>> We also plan to set up regular syncs (open to more interested >>>>>>>>>>>> participants) to continue refining and finalizing the UDF >>>>>>>>>>>> specification. >>>>>>>>>>>> - Ajantha >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat < >>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> I've updated the design document[1] based on the previous >>>>>>>>>>>>> comments. Additionally, I've included the SQL UDF syntax >>>>>>>>>>>>> supported by >>>>>>>>>>>>> various vendors, including Dremio, Snowflake, Databricks, and >>>>>>>>>>>>> Trino. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper discussion >>>>>>>>>>>>> is needed. Let's keep moving forward, especially with the renewed >>>>>>>>>>>>> interest >>>>>>>>>>>>> from the community. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat < >>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> During the last catalog community sync, there was significant >>>>>>>>>>>>>> interest in storing UDFs in Iceberg and adding endpoints for UDF >>>>>>>>>>>>>> handling >>>>>>>>>>>>>> in the REST catalog spec. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I recently discussed this with Yufei to better understand the >>>>>>>>>>>>>> new requirement of using UDFs for fine-grained access control >>>>>>>>>>>>>> policies. >>>>>>>>>>>>>> This expands the use cases beyond just versioned and >>>>>>>>>>>>>> interoperable UDFs. >>>>>>>>>>>>>> Additionally, I learnt that many vendors are interested in this >>>>>>>>>>>>>> feature. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Given the strong community interest and support, I’d like to >>>>>>>>>>>>>> take ownership of this effort and revive the work. I'll be >>>>>>>>>>>>>> revisiting the >>>>>>>>>>>>>> document I proposed long back and will share an updated proposal >>>>>>>>>>>>>> by next >>>>>>>>>>>>>> week. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg! >>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The UDF spec does not require representations to be SQL. It >>>>>>>>>>>>>>> merely does not specify (in this revision) how other >>>>>>>>>>>>>>> representations are to >>>>>>>>>>>>>>> be written. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This seems like an easy extension (adding a new type in the >>>>>>>>>>>>>>> "Representations" section). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue >>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the spec. It >>>>>>>>>>>>>>>> leaves a way for future versions to add different >>>>>>>>>>>>>>>> representations later, >>>>>>>>>>>>>>>> but only SQL is supported. That was also the feedback to my >>>>>>>>>>>>>>>> initial >>>>>>>>>>>>>>>> skepticism about how it would work to add functions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL >>>>>>>>>>>>>>>>> representations, although it is certainly faviouring SQL in >>>>>>>>>>>>>>>>> examples... It >>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong < >>>>>>>>>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this proposal >>>>>>>>>>>>>>>>>> focuses on SQL-based engines, while Python-based systems >>>>>>>>>>>>>>>>>> often work with >>>>>>>>>>>>>>>>>> data frames. Adding imperative languages like Python would >>>>>>>>>>>>>>>>>> make this >>>>>>>>>>>>>>>>>> proposal more inclusive. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>>>>>>>>>>> In the design doc linked before in this thread [1] i >>>>>>>>>>>>>>>>>>> read >>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to share >>>>>>>>>>>>>>>>>>> among different engines." >>>>>>>>>>>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully understand >>>>>>>>>>>>>>>>>>> yet how the proposed design addresses shareability between >>>>>>>>>>>>>>>>>>> the engines >>>>>>>>>>>>>>>>>>> though. >>>>>>>>>>>>>>>>>>> I would use some help to understand this better. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>>>>>> Piotr >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created >>>>>>>>>>>>>>>>>>>> functions shareable >>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in imperative >>>>>>>>>>>>>>>>>>>> code? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The >>>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea! >>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created functions >>>>>>>>>>>>>>>>>>>> sharable between the engines? >>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look >>>>>>>>>>>>>>>>>>>> like in e..g Spark or Trino? >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > Best >>>>>>>>>>>>>>>>>>>> > Piotr >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added >>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a >>>>>>>>>>>>>>>>>>>> design doc that covers >>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the choice >>>>>>>>>>>>>>>>>>>> to enumerate all >>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing >>>>>>>>>>>>>>>>>>>> generics and varargs. >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function >>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a >>>>>>>>>>>>>>>>>>>> discussion of the >>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think it >>>>>>>>>>>>>>>>>>>> would also be very >>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for this >>>>>>>>>>>>>>>>>>>> included in the >>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it solves >>>>>>>>>>>>>>>>>>>> those use cases >>>>>>>>>>>>>>>>>>>> with reasonable trade-offs. >>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is inconsistent >>>>>>>>>>>>>>>>>>>> with conventions in other specs. For example, using string >>>>>>>>>>>>>>>>>>>> IDs rather than >>>>>>>>>>>>>>>>>>>> an integer. >>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec versioning >>>>>>>>>>>>>>>>>>>> than the Iceberg view and table specs. It requires readers >>>>>>>>>>>>>>>>>>>> to fail if there >>>>>>>>>>>>>>>>>>>> are any unknown fields, which prevents the spec from >>>>>>>>>>>>>>>>>>>> adding things that are >>>>>>>>>>>>>>>>>>>> fully backward-compatible. Other Iceberg specs only >>>>>>>>>>>>>>>>>>>> require a version >>>>>>>>>>>>>>>>>>>> change to introduce forward-incompatible changes and I >>>>>>>>>>>>>>>>>>>> think that this >>>>>>>>>>>>>>>>>>>> should do the same to avoid confusion. >>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple >>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear how to >>>>>>>>>>>>>>>>>>>> encode them >>>>>>>>>>>>>>>>>>>> because a version is associated with a single function >>>>>>>>>>>>>>>>>>>> signature. >>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating >>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that the >>>>>>>>>>>>>>>>>>>> metadata proposed >>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases. >>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows a >>>>>>>>>>>>>>>>>>>> SELECT statement and it isn’t clear how this is distinct >>>>>>>>>>>>>>>>>>>> from a view >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review >>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next week. >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a look at >>>>>>>>>>>>>>>>>>>> the proposal >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> >>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin >>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an interesting >>>>>>>>>>>>>>>>>>>> direction, but there might be some details that need to be >>>>>>>>>>>>>>>>>>>> fine tuned. >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be >>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was directly >>>>>>>>>>>>>>>>>>>> linked in the >>>>>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> [1] >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>> Walaa. >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any >>>>>>>>>>>>>>>>>>>> review on the proposal. >>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far (from >>>>>>>>>>>>>>>>>>>> Benny). >>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>>>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal. >>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the >>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it. >>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin >>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant >>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here are >>>>>>>>>>>>>>>>>>>> some examples of >>>>>>>>>>>>>>>>>>>> what I meant in (2): >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation >>>>>>>>>>>>>>>>>>>> of (1) where the API is data flow/data pipeline API >>>>>>>>>>>>>>>>>>>> instead of SQL (e.g., >>>>>>>>>>>>>>>>>>>> Spark Scala). Yes, that is also possible in the very long >>>>>>>>>>>>>>>>>>>> run :) >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some long term >>>>>>>>>>>>>>>>>>>> opportunities in this case. Consider you register a Spark >>>>>>>>>>>>>>>>>>>> temp view as some >>>>>>>>>>>>>>>>>>>> sort of data frame read, then it could still be resolved >>>>>>>>>>>>>>>>>>>> to a Spark plan >>>>>>>>>>>>>>>>>>>> that is representable by an intermediate representation. >>>>>>>>>>>>>>>>>>>> But I agree this >>>>>>>>>>>>>>>>>>>> gets very complicated very soon, and just having the case >>>>>>>>>>>>>>>>>>>> (1) covered would >>>>>>>>>>>>>>>>>>>> already be a huge step forward. >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow < >>>>>>>>>>>>>>>>>>>> btc...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL >>>>>>>>>>>>>>>>>>>> UDF can be used to build a parameterized view. So, >>>>>>>>>>>>>>>>>>>> there's definitely a >>>>>>>>>>>>>>>>>>>> lot in common between UDFs and views. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin >>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is >>>>>>>>>>>>>>>>>>>> perceived as a "UDF". There are 2 flavors: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user >>>>>>>>>>>>>>>>>>>> whose definition is a composition of other built-in >>>>>>>>>>>>>>>>>>>> functions/SQL >>>>>>>>>>>>>>>>>>>> expressions. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references >>>>>>>>>>>>>>>>>>>> are pretty much from (1) and I think those have more >>>>>>>>>>>>>>>>>>>> analogy to views due >>>>>>>>>>>>>>>>>>>> to their SQL nature. Agree (2) is not practical to >>>>>>>>>>>>>>>>>>>> maintain by Iceberg, but >>>>>>>>>>>>>>>>>>>> I think Ajantha's use cases are around (1), and may be >>>>>>>>>>>>>>>>>>>> worth evaluating. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha >>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>>>>>>>>>>>> proposal, but I think this would be a very difficult area >>>>>>>>>>>>>>>>>>>> to tackle across >>>>>>>>>>>>>>>>>>>> engines, languages, and memory models without having a >>>>>>>>>>>>>>>>>>>> huge performance >>>>>>>>>>>>>>>>>>>> penalty. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL >>>>>>>>>>>>>>>>>>>> representations of UDFs (similar to views as shared by the >>>>>>>>>>>>>>>>>>>> reference links >>>>>>>>>>>>>>>>>>>> above), the complexity involved will be similar to >>>>>>>>>>>>>>>>>>>> managing views. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your >>>>>>>>>>>>>>>>>>>> input. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec >>>>>>>>>>>>>>>>>>>> (inspired by the view spec) this week to facilitate >>>>>>>>>>>>>>>>>>>> further discussions. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye < >>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a >>>>>>>>>>>>>>>>>>>> common set of functions across engines, I don't see how >>>>>>>>>>>>>>>>>>>> that is practical >>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. >>>>>>>>>>>>>>>>>>>> Plugging in code -- and >>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems inherently >>>>>>>>>>>>>>>>>>>> specialized to me >>>>>>>>>>>>>>>>>>>> and should be part of the engines' design. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I >>>>>>>>>>>>>>>>>>>> feel we can say exactly the same thing for Iceberg views, >>>>>>>>>>>>>>>>>>>> but yet we have >>>>>>>>>>>>>>>>>>>> Iceberg multi-dialect views implemented. Maybe it sounds >>>>>>>>>>>>>>>>>>>> like we are trying >>>>>>>>>>>>>>>>>>>> to draw a line between SQL vs other programming language >>>>>>>>>>>>>>>>>>>> as "code"? but I >>>>>>>>>>>>>>>>>>>> think SQL is just another type of code, and we are already >>>>>>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>>>>>> compiling all these different code dialects to an >>>>>>>>>>>>>>>>>>>> intermediate >>>>>>>>>>>>>>>>>>>> representation (using projects like Coral, Substrait), >>>>>>>>>>>>>>>>>>>> which will be stored >>>>>>>>>>>>>>>>>>>> as another type of representation of Iceberg view. I think >>>>>>>>>>>>>>>>>>>> the same >>>>>>>>>>>>>>>>>>>> functionality can be used for UDFs if developed. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a >>>>>>>>>>>>>>>>>>>> good idea, even just a multi-dialect one like view, and >>>>>>>>>>>>>>>>>>>> that can allow >>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a >>>>>>>>>>>>>>>>>>>> function referenced >>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect UDF >>>>>>>>>>>>>>>>>>>> definition. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have >>>>>>>>>>>>>>>>>>>> the actual proposal published. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert >>>>>>>>>>>>>>>>>>>> Stupp <sn...@snazy.de> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable >>>>>>>>>>>>>>>>>>>> and "non-centralized" as views are. The same performance >>>>>>>>>>>>>>>>>>>> concerns apply to >>>>>>>>>>>>>>>>>>>> views as well. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon >>>>>>>>>>>>>>>>>>>> which engines can build, so the argument that UDFs aren't >>>>>>>>>>>>>>>>>>>> practical, >>>>>>>>>>>>>>>>>>>> because engines are different, is probably only a >>>>>>>>>>>>>>>>>>>> temporary concern. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also >>>>>>>>>>>>>>>>>>>> try to tackle the idea to make views portable, which is >>>>>>>>>>>>>>>>>>>> conceptually not >>>>>>>>>>>>>>>>>>>> that much different from portable UDFs. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative >>>>>>>>>>>>>>>>>>>> touch to the idea of having UDFs in Iceberg, especially >>>>>>>>>>>>>>>>>>>> not in this early >>>>>>>>>>>>>>>>>>>> stage. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good >>>>>>>>>>>>>>>>>>>> idea to add UDFs tracked by Iceberg catalogs. I think that >>>>>>>>>>>>>>>>>>>> Iceberg >>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, like >>>>>>>>>>>>>>>>>>>> tables of data. >>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of functions >>>>>>>>>>>>>>>>>>>> across engines, I >>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines are >>>>>>>>>>>>>>>>>>>> implemented so >>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially custom >>>>>>>>>>>>>>>>>>>> user-supplied code >>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be part >>>>>>>>>>>>>>>>>>>> of the engines' >>>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post >>>>>>>>>>>>>>>>>>>> the proposal, but I think this would be a very difficult >>>>>>>>>>>>>>>>>>>> area to tackle >>>>>>>>>>>>>>>>>>>> across engines, languages, and memory models without >>>>>>>>>>>>>>>>>>>> having a huge >>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha >>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the >>>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs in >>>>>>>>>>>>>>>>>>>> Iceberg. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition >>>>>>>>>>>>>>>>>>>> for storing the versioned UDFs in Iceberg (inspired by >>>>>>>>>>>>>>>>>>>> view spec). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to >>>>>>>>>>>>>>>>>>>> views in that they are associated with tables, but they >>>>>>>>>>>>>>>>>>>> can accept >>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even function as >>>>>>>>>>>>>>>>>>>> inline expressions. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, >>>>>>>>>>>>>>>>>>>> Snowflake, Databricks Spark supports SQL UDFs at catalog >>>>>>>>>>>>>>>>>>>> level [1]. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines. >>>>>>>>>>>>>>>>>>>> Potentially engines can understand the UDFs written by >>>>>>>>>>>>>>>>>>>> other engines (with >>>>>>>>>>>>>>>>>>>> the translate layer). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this >>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, and >>>>>>>>>>>>>>>>>>>> we're eager to >>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF >>>>>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a >>>>>>>>>>>>>>>>>>>> specification to propose to the community. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> -- >>>>>>>>>>>>>>>>>>>> >> Ryan Blue >>>>>>>>>>>>>>>>>>>> >> Databricks >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>> Databricks >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>