RussellSpitzer commented on code in PR #14117:
URL: https://github.com/apache/iceberg/pull/14117#discussion_r2713934181


##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                 |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                    |
+| *optional*  | `doc`             | `string`               | Documentation 
string.                                                 |
+
+Notes:
+1. Engines must prevent leakage of sensitive information when a function is 
marked as `secure` by setting it to `true`.
+2. Entries in `properties` are treated as hints, not strict rules.
+
+### Definition
+
+Each `definition` represents one function signature (e.g., `add_one(int)` vs 
`add_one(float)`).
+
+| Requirement | Field name           | Type                                    
        | Description                                                           
                                        |
+|-------------|----------------------|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| *required*  | `definition-id`      | `string`                                
        | An identifier derived from canonical parameter-type tuple (lowercase, 
no spaces; e.g., `"(int,int,string)"`). |

Review Comment:
   I find the description here a bit confusing. Is it meant to just be a 
signature string? or is it just a unique id for a particular signature?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.

Review Comment:
   This is unclear to me, are we defining the file name as being the same or 
are we saying that a Catalog must first have a reference to the UDF and that is 
what must be atomically swapped?
   



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |

Review Comment:
   ```suggestion
   | *required*  | `format-version`  | `int`                  | UDF 
Specification Version (must be `1`).                                |
   ```
   
   Do we write anywhere that this is Version 1? we probably should



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                 |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                    |

Review Comment:
   Should link to the footnote?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |

Review Comment:
   It what cases is this not where the UDF file is? IE, If I read this file and 
it's not where the location string says, is that a problem? Or is this just for 
future versions?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                 |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                    |
+| *optional*  | `doc`             | `string`               | Documentation 
string.                                                 |
+
+Notes:
+1. Engines must prevent leakage of sensitive information when a function is 
marked as `secure` by setting it to `true`.

Review Comment:
   Probably needs more of a definition than this. Engines may not expose UDF 
implementation details to the end users?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |

Review Comment:
   "function" here I think is a little underspecified. A UUID that identifies 
this UDF, generated at creation



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common

Review Comment:
   Not sure we need this line



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                 |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                    |
+| *optional*  | `doc`             | `string`               | Documentation 
string.                                                 |
+
+Notes:
+1. Engines must prevent leakage of sensitive information when a function is 
marked as `secure` by setting it to `true`.
+2. Entries in `properties` are treated as hints, not strict rules.
+
+### Definition
+
+Each `definition` represents one function signature (e.g., `add_one(int)` vs 
`add_one(float)`).
+
+| Requirement | Field name           | Type                                    
        | Description                                                           
                                        |
+|-------------|----------------------|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| *required*  | `definition-id`      | `string`                                
        | An identifier derived from canonical parameter-type tuple (lowercase, 
no spaces; e.g., `"(int,int,string)"`). |
+| *required*  | `parameters`         | `list<parameter>`                       
        | Ordered list of [function parameters](#parameter). Invocation order 
**must** match this list.                 |
+| *required*  | `return-type`        | `string`                                
        | Declared return type (see [Parameter Type](#parameter-type)).         
                                        |
+| *optional*  | `nullable-return`    | `boolean`                               
        | A hint to indicate whether the return value is nullable or not. 
Default: `true`.                              |
+| *required*  | `versions`           | `list<definition-version>`              
        | [Versioned implementations](#definition-version) of this definition.  
                                        |
+| *required*  | `current-version-id` | `int`                                   
        | Identifier of the current version for this definition.                
                                        |
+| *optional*  | `function-type`      | `string` (`"udf"` or `"udtf"`, default 
`"udf"`) | If `"udtf"`, `return-type` must be an Iceberg type `struct` 
describing the output schema.                     |
+| *optional*  | `doc`                | `string`                                
        | Documentation string.                                                 
                                        |
+
+### Parameter
+| Requirement | Field  | Type     | Description                                
                  |
+|-------------|--------|----------|--------------------------------------------------------------|
+| *required*  | `type` | `string` | Parameter data type (see [Parameter 
Type](#parameter-type)). |
+| *required*  | `name` | `string` | Parameter name.                            
                  |
+| *optional*  | `doc`  | `string` | Parameter documentation.                   
                  |
+
+Notes:
+1. Function definitions are identified by the tuple of `type`s and there can 
be only one definition for a given tuple.

Review Comment:
   Wondering if we should commonly use "signature" instead of "tuple of types"



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.

Review Comment:
   Maybe this should just be, UDF metadata files are immutable and 
modifications should cause a new file to be created.  Catalogs can then use an 
atomic swap, similar to an Iceberg table, to change the UDF linked with a 
particular catalog identifier.
   
   Or something?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                 |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                    |
+| *optional*  | `doc`             | `string`               | Documentation 
string.                                                 |
+
+Notes:
+1. Engines must prevent leakage of sensitive information when a function is 
marked as `secure` by setting it to `true`.
+2. Entries in `properties` are treated as hints, not strict rules.
+
+### Definition
+
+Each `definition` represents one function signature (e.g., `add_one(int)` vs 
`add_one(float)`).
+
+| Requirement | Field name           | Type                                    
        | Description                                                           
                                        |
+|-------------|----------------------|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| *required*  | `definition-id`      | `string`                                
        | An identifier derived from canonical parameter-type tuple (lowercase, 
no spaces; e.g., `"(int,int,string)"`). |
+| *required*  | `parameters`         | `list<parameter>`                       
        | Ordered list of [function parameters](#parameter). Invocation order 
**must** match this list.                 |
+| *required*  | `return-type`        | `string`                                
        | Declared return type (see [Parameter Type](#parameter-type)).         
                                        |
+| *optional*  | `nullable-return`    | `boolean`                               
        | A hint to indicate whether the return value is nullable or not. 
Default: `true`.                              |

Review Comment:
   Why do we specify it here and not in the return type? I guess it's just a 
hint so it doesn't really matter.



##########
format/udf-spec.md:
##########
@@ -0,0 +1,322 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a single value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                    |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.      |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                  |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                   |
+| *optional*  | `location`        | `string`               | The function's 
base location; used to create metadata file locations. |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                 |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                    |
+| *optional*  | `doc`             | `string`               | Documentation 
string.                                                 |
+
+Notes:
+1. Engines must prevent leakage of sensitive information when a function is 
marked as `secure` by setting it to `true`.
+2. Entries in `properties` are treated as hints, not strict rules.
+
+### Definition
+
+Each `definition` represents one function signature (e.g., `add_one(int)` vs 
`add_one(float)`).
+
+| Requirement | Field name           | Type                                    
        | Description                                                           
                                        |
+|-------------|----------------------|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| *required*  | `definition-id`      | `string`                                
        | An identifier derived from canonical parameter-type tuple (lowercase, 
no spaces; e.g., `"(int,int,string)"`). |
+| *required*  | `parameters`         | `list<parameter>`                       
        | Ordered list of [function parameters](#parameter). Invocation order 
**must** match this list.                 |
+| *required*  | `return-type`        | `string`                                
        | Declared return type (see [Parameter Type](#parameter-type)).         
                                        |
+| *optional*  | `nullable-return`    | `boolean`                               
        | A hint to indicate whether the return value is nullable or not. 
Default: `true`.                              |
+| *required*  | `versions`           | `list<definition-version>`              
        | [Versioned implementations](#definition-version) of this definition.  
                                        |
+| *required*  | `current-version-id` | `int`                                   
        | Identifier of the current version for this definition.                
                                        |
+| *optional*  | `function-type`      | `string` (`"udf"` or `"udtf"`, default 
`"udf"`) | If `"udtf"`, `return-type` must be an Iceberg type `struct` 
describing the output schema.                     |

Review Comment:
   Why is this at the definition level? Are we ok with some signatures being 
UDF and others being UDTF? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to