[jira] [Updated] (SPARK-54598) Refactor UDF fetching logic out from invocation

Yicong Huang (Jira) Thu, 04 Dec 2025 15:54:04 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-54598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yicong Huang updated SPARK-54598:
---------------------------------
    Description: 
Currently we always fetch UDFs (function and its arguments) in the logic of 
each different UDF, which is pretty redundent. 

The current implementation has redundant UDF reading logic scattered throughout 
`read_udfs()`:

**Single UDF pattern** (repeated in multiple branches):
```python
arg_offsets, f = read_single_udf(
    pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
)
parsed_offsets = extract_key_value_indexes(arg_offsets)  # when needed
```

**Multiple UDFs pattern** (repeated in multiple branches):
```python
udfs = []
for i in range(num_udfs):
    udfs.append(
        read_single_udf(
            pickleSer, infile, eval_type, runner_conf, udf_index=i, 
profiler=profiler
        )
    )


 

  was:
Currently we always fetch UDFs (function and its arguments) 



Single UDF:

```
arg_offsets, f = read_single_udf(
pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
)
parsed_offsets=extract_key_value_indexes(arg_offsets)
```
 
Multi UDFs:
 


{code:python}
        udfs = []
        for i in range(num_udfs):
            udfs.append(
                read_single_udf(
                    pickleSer, infile, eval_type, runner_conf, udf_index=i, 
profiler=profiler
                )
            )
{code}


 


> Refactor UDF fetching logic out from invocation
> -----------------------------------------------
>
>                 Key: SPARK-54598
>                 URL: https://issues.apache.org/jira/browse/SPARK-54598
>             Project: Spark
>          Issue Type: Task
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Yicong Huang
>            Priority: Major
>
> Currently we always fetch UDFs (function and its arguments) in the logic of 
> each different UDF, which is pretty redundent. 
> The current implementation has redundant UDF reading logic scattered 
> throughout `read_udfs()`:
> **Single UDF pattern** (repeated in multiple branches):
> ```python
> arg_offsets, f = read_single_udf(
>     pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
> )
> parsed_offsets = extract_key_value_indexes(arg_offsets)  # when needed
> ```
> **Multiple UDFs pattern** (repeated in multiple branches):
> ```python
> udfs = []
> for i in range(num_udfs):
>     udfs.append(
>         read_single_udf(
>             pickleSer, infile, eval_type, runner_conf, udf_index=i, 
> profiler=profiler
>         )
>     )
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-54598) Refactor UDF fetching logic out from invocation

Reply via email to