Yicong-Huang commented on code in PR #55222:
URL: https://github.com/apache/spark/pull/55222#discussion_r3077661890
##########
python/pyspark/worker.py:
##########
@@ -2575,6 +2510,35 @@ def read_udfs(pickleSer, infile, eval_type, runner_conf,
eval_conf):
for i in range(num_udfs)
]
+ def extract_key_value_indexes(grouped_arg_offsets):
+ """
+ Helper function to extract the key and value indexes from arg_offsets
for the grouped and
+ cogrouped pandas udfs. See BasePandasGroupExec.resolveArgOffsets for
equivalent scala code.
+
+ Parameters
+ ----------
+ grouped_arg_offsets: list
+ List containing the key and value indexes of columns of the
+ DataFrames to be passed to the udf. It consists of n repeating
groups where n is the
+ number of DataFrames. Each group has the following format:
+ group[0]: length of group
+ group[1]: length of key indexes
+ group[2.. group[1] +2]: key attributes
+ group[group[1] +3 group[0]]: value attributes
+ """
+ parsed = []
+ idx = 0
+ while idx < len(grouped_arg_offsets):
+ offsets_len = grouped_arg_offsets[idx]
+ idx += 1
+ offsets = grouped_arg_offsets[idx : idx + offsets_len]
+ split_index = offsets[0] + 1
+ offset_keys = offsets[1:split_index]
+ offset_values = offsets[split_index:]
Review Comment:
we can refactor it in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]