[
https://issues.apache.org/jira/browse/SPARK-54598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yicong Huang updated SPARK-54598:
---------------------------------
Description:
The current implementation has redundant UDF reading logic scattered throughout
`read_udfs()`:
**Single UDF pattern** (repeated in multiple branches):
{code:python}
arg_offsets, f = read_single_udf(
pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
)
parsed_offsets = extract_key_value_indexes(arg_offsets) # when needed
{code}
**Multiple UDFs pattern** (repeated in multiple branches):
{code:python}
udfs = []
for i in range(num_udfs):
udfs.append(
read_single_udf(
pickleSer, infile, eval_type, runner_conf, udf_index=i,
profiler=profiler
)
)
{code}
was:
Currently we always fetch UDFs (function and its arguments) in the logic of
each different UDF, which is pretty redundent.
The current implementation has redundant UDF reading logic scattered throughout
`read_udfs()`:
**Single UDF pattern** (repeated in multiple branches):
```python
arg_offsets, f = read_single_udf(
pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
)
parsed_offsets = extract_key_value_indexes(arg_offsets) # when needed
```
**Multiple UDFs pattern** (repeated in multiple branches):
```python
udfs = []
for i in range(num_udfs):
udfs.append(
read_single_udf(
pickleSer, infile, eval_type, runner_conf, udf_index=i,
profiler=profiler
)
)
> Refactor UDF fetching logic out from invocation
> -----------------------------------------------
>
> Key: SPARK-54598
> URL: https://issues.apache.org/jira/browse/SPARK-54598
> Project: Spark
> Issue Type: Task
> Components: PySpark
> Affects Versions: 4.2.0
> Reporter: Yicong Huang
> Priority: Major
>
> The current implementation has redundant UDF reading logic scattered
> throughout `read_udfs()`:
> **Single UDF pattern** (repeated in multiple branches):
> {code:python}
> arg_offsets, f = read_single_udf(
> pickleSer, infile, eval_type, runner_conf, udf_index=0, profiler=profiler
> )
> parsed_offsets = extract_key_value_indexes(arg_offsets) # when needed
> {code}
> **Multiple UDFs pattern** (repeated in multiple branches):
> {code:python}
> udfs = []
> for i in range(num_udfs):
> udfs.append(
> read_single_udf(
> pickleSer, infile, eval_type, runner_conf, udf_index=i,
> profiler=profiler
> )
> )
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]