UDFs are as engine specific and portable and "non-centralized" as views
are. The same performance concerns apply to views as well.
Iceberg should define a common base upon which engines can build, so the
argument that UDFs aren't practical, because engines are different, is
probably only a temporary concern.
In the long term, Iceberg should also try to tackle the idea to make
views portable, which is conceptually not that much different from
portable UDFs.
PS: I'm not a fan of adding a negative touch to the idea of having UDFs
in Iceberg, especially not in this early stage.
On 24.05.24 20:53, Ryan Blue wrote:
Thanks, Ajantha.
I'm skeptical about whether it's a good idea to add UDFs tracked by
Iceberg catalogs. I think that Iceberg primarily deals with things
that are centralized, like tables of data. While it would be great to
have a common set of functions across engines, I don't see how that is
practical when those engines are implemented so differently. Plugging
in code -- and especially custom user-supplied code -- seems
inherently specialized to me and should be part of the engines' design.
I guess we'll know more when you post the proposal, but I think this
would be a very difficult area to tackle across engines, languages,
and memory models without having a huge performance penalty.
Ryan
On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <[email protected]>
wrote:
Hi Everyone,
This is a discussion to gauge the community interest in storing
the Versioned SQL UDFs in Iceberg.
We want to propose the spec addition for storing the versioned
UDFs in Iceberg (inspired by view spec).
These UDFs can operate similarly to views in that they are
associated with tables, but they can accept arguments and produce
return values, or even function as inline expressions.
Many Query engines like Dremio, Trino, Snowflake, Databricks Spark
supports SQL UDFs at catalog level [1].
But storing them in Iceberg can enable
- Versioning of these UDFs.
- Interoperability between the engines. Potentially engines can
understand the UDFs written by other engines (with the translate
layer).
We believe that integrating this feature into Iceberg would be a
valuable addition, and we're eager to collaborate with the
community to develop a UDF specification.
Stephen <mailto:[email protected]> has already begun drafting
a specification to propose to the community.
Let us know your thoughts on this.
[1]
Dremio -
https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
Trino - https://trino.io/docs/current/sql/create-function.html
Snowflake -
https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
Databricks -
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
- Ajantha
--
Ryan Blue
Tabular
--
Robert Stupp
@snazy