[ 
https://issues.apache.org/jira/browse/SPARK-54598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Huang updated SPARK-54598:
---------------------------------
    Description: 
The current implementation has redundant UDF reading logic scattered throughout 
`read_udfs()`:

**Single UDF pattern** (repeated in multiple branches):
{code:python}
arg_offsets, f = read_single_udf(
    pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
)
parsed_offsets = extract_key_value_indexes(arg_offsets)  # when needed
{code}

**Multiple UDFs pattern** (repeated in multiple branches):
{code:python}
udfs = []
for i in range(num_udfs):
    udfs.append(
        read_single_udf(
            pickleSer, infile, eval_type, runner_conf, udf_index=i, 
profiler=profiler
        )
    )
{code}

 

  was:
Currently we always fetch UDFs (function and its arguments) in the logic of 
each different UDF, which is pretty redundent. 

The current implementation has redundant UDF reading logic scattered throughout 
`read_udfs()`:

**Single UDF pattern** (repeated in multiple branches):
```python
arg_offsets, f = read_single_udf(
    pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
)
parsed_offsets = extract_key_value_indexes(arg_offsets)  # when needed
```

**Multiple UDFs pattern** (repeated in multiple branches):
```python
udfs = []
for i in range(num_udfs):
    udfs.append(
        read_single_udf(
            pickleSer, infile, eval_type, runner_conf, udf_index=i, 
profiler=profiler
        )
    )


 


> Refactor UDF fetching logic out from invocation
> -----------------------------------------------
>
>                 Key: SPARK-54598
>                 URL: https://issues.apache.org/jira/browse/SPARK-54598
>             Project: Spark
>          Issue Type: Task
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Yicong Huang
>            Priority: Major
>
> The current implementation has redundant UDF reading logic scattered 
> throughout `read_udfs()`:
> **Single UDF pattern** (repeated in multiple branches):
> {code:python}
> arg_offsets, f = read_single_udf(
>     pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
> )
> parsed_offsets = extract_key_value_indexes(arg_offsets)  # when needed
> {code}
> **Multiple UDFs pattern** (repeated in multiple branches):
> {code:python}
> udfs = []
> for i in range(num_udfs):
>     udfs.append(
>         read_single_udf(
>             pickleSer, infile, eval_type, runner_conf, udf_index=i, 
> profiler=profiler
>         )
>     )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to